2020-05-12T05:03:00Z

Are there any general guidelines to allocate table space quota to different layers in ETL?


I am a solution architect for a global tech vendor.

I am looking for any general guidelines to allocate tablespace quota to different layers/schemas in the ETL flow of a data warehouse (% of total space in each layer).

I have these 4 layers:

  1. Staging -truncate and load data from source files
  2. ODS- Type 1 persistent tables
  3. Transformation layer- similar to final DWH layer but truncate before loading newly arrived data
  4. DWH layer- Final dimensional model layer

I understand space requirements may vary based on project requirements, however, any general guideline (if any such in data warehousing and ETL space) to estimate the space will be helpful.

Thanks! I appreciate your help.

Guest
22 Answers

author avatar
Top 5Real User

Space requirement will be directly dependent on design considerations and data requirements. Any estimation without detailed information is bound to be faulty and should be used as reference only.
Here are my recommendation on how to go about it:
1. Staging
Estimating Staging will be dependent on the following:
1. Understanding of sources
- Each source, amount of data it's going to provide
- Regular full/ incremental data and special days data volume

Having said these, typically 5 % of space should do for incremental data loads.

2. ODS- Type 1 persistent tables

Estimating ODS will be dependent on the following:
1. Understanding of sources
- Each source, amount of data it's going to provide to ODS
- Regular full/ incremental data and special days data volume

2. data retention period is a critical as space requirement can increase twice if retention days is doubled

Having said these, typically 10 - 15 % of space with 14 days retention will be reasonable.

3. Transformation layer- similar to final DWH layer but truncate before loading newly arrived data
Estimating Transformation will be dependent on the following:
1. Modeling strategy used
2. Number of additional derived information is maintained
3. Data volatility
4. Audit information captured

Since it's a truncate and load, expect it to be the size of approximately 5 - 10 %

4. DWH layer- Final dimensional model layer
- Typically 70 - 80 % of space is allocated to DWH.
- Dimension, Facts, Aggregate tables, Indexes, etc. should be carefully considered for estimation
- Data retention criteria (if any) should also be considered

Other key considerations:
1. Special attention to space requirements for historical data load should be considered. Most projects miss this important step and struggle at the end.
2. The system should also be estimated for different project environments. Typically Dev (10 %), System Test(5 %), UAT(15 %) of entire space should be allocated
3. 10 % of data growth on a year on year basis for ODS/ Staging/ Transformation layer should be considered
4. The growth of DWH will be dependent on modeling techniques and data volatility. These must be considered for the next few years
5. Some buffer should also be retained for any unforeseen circumstance.

2020-05-13T06:21:21Z
author avatar
Top 5Real User

Thank you Gouri, it helps !!

2020-05-15T09:36:01Z
Find out what your peers are saying about Snowflake Computing, Microsoft, Amazon and others in Cloud Data Warehouse. Updated: November 2020.
448,076 professionals have used our research since 2012.