We just raised a $30M Series A: Read our story
2020-04-27T05:41:00Z

What are the benefits of having separate layers or a dedicated schema for each layer in ETL?

123

Hi community,

I am a solution architect for a global tech company with over ten years of experience.

What are the benefits of having separate layers or a dedicated schema for each layer in ETL? 

One dedicated layer for staging, one for type 1 persistent tables, and a dedicated schema/layer for tables at a dimensional model? 

Why it is not recommended to have all tables in one schema?

Thanks! 

I appreciate your help. 

ITCS user
Guest
66 Answers

author avatar
Top 5LeaderboardReal User

I have over 15 years of ETL experience on "real world" projects & am now teaching grad courses in the Business Intelligence life cycle. 


It does not make any sense to separate the ETL process into "layers". I do recommend that "all" tables do show in a star schema diagram so that it can be understood which must be handled early in TL (transformation & Load) steps so that rows are in place when needed to be joined to foreign keys in records that are handled afterward. Dimension table rows do persist longer than those identified as fact tables. 

2021-09-12T21:51:07Z
author avatar
Top 10Real User

Here are some of the advantages of managing data in different layers:
1. Provides logical separation of data between different layers
2. For any maintenance of each layer such as backup or recovery or data model change apply, etc. are layer dependent
3. From data security perspective, only authorized resource can work in their respective layer
4. Space allocation for each layer can be done independently

On a practical point, it gives freedom to work each layer independently and putting them together will be a project nightmare.

2020-05-13T06:34:55Z
author avatar
Top 5LeaderboardUser

Traditional ETL would usually use a dedicated database (or even database server) where you'll load & transform your raw data before ingesting it into the final destination. This would allow checking data before its final destination. 


Data transformation pipeline in DW with the arrival of Cloud Data Warehouse like snowflake has changed the landscape. The DW has also become a data lake where all raw data is stored. Using a transformation tool like DBT, you could build your fact and dimension tables, therefore are able to grab data from RAW and send them to its final destination. 


For your raw data, it does make sense to separate the sources into different schemas.
You can also separate your final destination into different schemas too. 
One for Finance, one for Product, one for Marketing. This is all you can grant at the schema level for each role. Much easier to manage for permissions.

Check out getdbt.com

2021-10-01T22:35:42Z
author avatar
Top 5Vendor

The main reason is security and governance. 


Most of the time, you are required to perform different actions on the data. Cleansing and adapting it to naming standards is pretty common and this could happen in different steps.

Having different schemas helps to prevent unwanted mistakes. 

2021-09-14T13:20:16Z
author avatar
Top 20Real User

From a business perspective, it is recommended to extract data from the source system only once. In large organisations, there may be several fields in various tables which will not be required for reporting immediately. So while all data is extracted from the source system to the Persistent Storage Area in the target system, one could store this data in a staging layer in the target system for current / future use. The staging layer would typically contain all data extracted or could be filtered and transformed as required. The Persistent storage area is typically cleared within 15 days.The Staging layer in the target system also serves as a backup of all data in case your source system is down.


The advantage in building another layer above the staging layer will be to transform data further and load it in a form which will make sense for business. An example would be where one creates a value field as quantity × rate or any other formula such as computation of a discount from other fields. Only fields required for reporting could be stored in this permanent layer. 


Further layers for data targets as required could be created depending on Performance and reporting considerations and the Reporting tool one uses.


The above multi layer architecture described was for older installations.


Newer setups have features which allow one to construct a view based on a single/multiple tables and report directly.




2021-09-13T10:52:53Z
author avatar
Top 5Real User

Thank you Gouri !!!

2020-05-20T15:37:34Z
Learn what your peers think about Microsoft Azure Synapse Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: November 2021.
554,529 professionals have used our research since 2012.