2020-04-27T05:41:00Z

What are the benefits of having separate layers or a dedicated schema for each layer in ETL?

RS
  • 7
  • 576
PeerSpot user
6

6 Answers

BL
Real User
2021-09-12T21:51:07Z
Sep 12, 2021

I have over 15 years of ETL experience on "real world" projects & am now teaching grad courses in the Business Intelligence life cycle. 


It does not make any sense to separate the ETL process into "layers". I do recommend that "all" tables do show in a star schema diagram so that it can be understood which must be handled early in TL (transformation & Load) steps so that rows are in place when needed to be joined to foreign keys in records that are handled afterward. Dimension table rows do persist longer than those identified as fact tables. 

Search for a product comparison in Cloud Data Warehouse
GM
Real User
2020-05-13T06:34:55Z
May 13, 2020

Here are some of the advantages of managing data in different layers:
1. Provides logical separation of data between different layers
2. For any maintenance of each layer such as backup or recovery or data model change apply, etc. are layer dependent
3. From data security perspective, only authorized resource can work in their respective layer
4. Space allocation for each layer can be done independently

On a practical point, it gives freedom to work each layer independently and putting them together will be a project nightmare.

MA
Real User
2021-10-01T22:35:42Z
Oct 1, 2021

Traditional ETL would usually use a dedicated database (or even database server) where you'll load & transform your raw data before ingesting it into the final destination. This would allow checking data before its final destination. 


Data transformation pipeline in DW with the arrival of Cloud Data Warehouse like snowflake has changed the landscape. The DW has also become a data lake where all raw data is stored. Using a transformation tool like DBT, you could build your fact and dimension tables, therefore are able to grab data from RAW and send them to its final destination. 


For your raw data, it does make sense to separate the sources into different schemas.
You can also separate your final destination into different schemas too. 
One for Finance, one for Product, one for Marketing. This is all you can grant at the schema level for each role. Much easier to manage for permissions.

Check out getdbt.com

DG
Vendor
2021-09-14T13:20:16Z
Sep 14, 2021

The main reason is security and governance. 


Most of the time, you are required to perform different actions on the data. Cleansing and adapting it to naming standards is pretty common and this could happen in different steps.

Having different schemas helps to prevent unwanted mistakes. 

JF
Real User
2021-09-13T10:52:53Z
Sep 13, 2021

From a business perspective, it is recommended to extract data from the source system only once. In large organisations, there may be several fields in various tables which will not be required for reporting immediately. So while all data is extracted from the source system to the Persistent Storage Area in the target system, one could store this data in a staging layer in the target system for current / future use. The staging layer would typically contain all data extracted or could be filtered and transformed as required. The Persistent storage area is typically cleared within 15 days.The Staging layer in the target system also serves as a backup of all data in case your source system is down.


The advantage in building another layer above the staging layer will be to transform data further and load it in a form which will make sense for business. An example would be where one creates a value field as quantity × rate or any other formula such as computation of a discount from other fields. Only fields required for reporting could be stored in this permanent layer. 


Further layers for data targets as required could be created depending on Performance and reporting considerations and the Reporting tool one uses.


The above multi layer architecture described was for older installations.


Newer setups have features which allow one to construct a view based on a single/multiple tables and report directly.




RS
Real User
2020-05-20T15:37:34Z
May 20, 2020

Thank you Gouri !!!

Learn what your peers think about Microsoft Azure Synapse Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
763,955 professionals have used our research since 2012.
Microsoft Azure Synapse Analytics is an end-to-end analytics solution that successfully combines analytical services to merge big data analytics and enterprise data warehouses into a single unified platform. The solution can run intelligent distributed queries among nodes, and provides the ability to query both relational and non-relational data. Microsoft Azure Synapse Analytics is built with these 4 components: Synapse SQL Spark Synapse Pipeline Studio Microsoft Azure Synapse Analytics...
Download Microsoft Azure Synapse Analytics ReportRead more

Related Q&As

Related articles

Cloud Data Warehouse experts

Joaquin Marques - PeerSpot reviewer
Atif Tariq - PeerSpot reviewer
Miodrag Milojevic - PeerSpot reviewer
Ravi Kuppusamy - PeerSpot reviewer
Kevin McAllister - PeerSpot reviewer
VinayKumar2 - PeerSpot reviewer
Kumar Madhur - PeerSpot reviewer
Rohit Sircar - PeerSpot reviewer