What are the benefits of having separate layers or a dedicated schema for each layer in ETL?

Question

What are the benefits of having separate layers or a dedicated schema for each layer in ETL?

Hi community,

I am a solution architect for a global tech company with over ten years of experience.

One dedicated layer for staging, one for type 1 persistent tables, and a dedicated schema/layer for tables at a dimensional model?

Why it is not recommended to have all tables in one schema?

Thanks!

I appreciate your help.

RS

RajneeshShukla

Solution Architect at a tech vendor with 10,001+ employees

7
539

Buyer's Guide

Microsoft Azure Synapse Analytics

April 2024

Get the report

Helped 769,789 peers since 2012

6 Answers

Last answered Oct 1, 2021

Search for a product comparison in Cloud Data Warehouse

Buyer's Guide

Microsoft Azure Synapse Analytics

April 2024

Free Report: Microsoft Azure Synapse Analytics Reviews and More

Learn what your peers think about Microsoft Azure Synapse Analytics. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.

DOWNLOAD NOW

769,789 professionals have used our research since 2012.

Microsoft Azure Synapse Analytics

86 Reviews

Microsoft Azure Synapse Analytics is an end-to-end analytics solution that successfully combines analytical services to merge big data analytics and enterprise data warehouses into a single unified platform. The solution can run intelligent distributed queries among nodes, and provides the ability to query both relational and non-relational data. Microsoft Azure Synapse Analytics is built with these 4 components: Synapse SQL Spark Synapse Pipeline Studio Microsoft Azure Synapse Analytics...

Download Microsoft Azure Synapse Analytics Report Read more

Related categories

Hadoop

Data Warehouse

Data Integration

Related Q&As

Oct 4, 2023

Which solution has better performance: Snowflake, Microsoft Azure Synapse Analytics, or Firebolt?

Aug 3, 2023

Which solution do you prefer: KNIME, Azure Synapse Analytics, or Azure Data Factory?

Cloud Data Warehouse experts

Joaquin Marques

CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC

Miodrag Milojevic

Senior Data Archirect at Yettel

Atif Tariq

Cloud and Big Data Engineer | Developer at Huawei Cloud Middle East

Ravi Kuppusamy

CEO and Founder at BAssure Solutions

Kevin McAllister

Executive Manager at Hexagon AB

Rohit Sircar

Integration Solutions Lead | Digital Core Transformation Service Line at Hexaware Technologies Limited

Syed Zakaulla

Project Manager at Softway

MartinPotgieter

Services Manager at Bytes Systems Integration

Join the PeerSpot community

score 2 · Answer 1 · 2021-09-12T21:51:07Z

I have over 15 years of ETL experience on "real world" projects & am now teaching grad courses in the Business Intelligence life cycle.

It does not make any sense to separate the ETL process into "layers". I do recommend that "all" tables do show in a star schema diagram so that it can be understood which must be handled early in TL (transformation & Load) steps so that rows are in place when needed to be joined to foreign keys in records that are handled afterward. Dimension table rows do persist longer than those identified as fact tables.

Gouri Mishra Teradata FSLDM Consultant at TIK IT Solutions · Answer 2 · 2020-05-13T06:34:55Z

Here are some of the advantages of managing data in different layers:
1. Provides logical separation of data between different layers
2. For any maintenance of each layer such as backup or recovery or data model change apply, etc. are layer dependent
3. From data security perspective, only authorized resource can work in their respective layer
4. Space allocation for each layer can be done independently

On a practical point, it gives freedom to work each layer independently and putting them together will be a project nightmare.

Data Sleek Owner at a consultancy with 1-10 employees · Answer 3 · 2021-10-01T22:35:42Z

Traditional ETL would usually use a dedicated database (or even database server) where you'll load & transform your raw data before ingesting it into the final destination. This would allow checking data before its final destination.

Data transformation pipeline in DW with the arrival of Cloud Data Warehouse like snowflake has changed the landscape. The DW has also become a data lake where all raw data is stored. Using a transformation tool like DBT, you could build your fact and dimension tables, therefore are able to grab data from RAW and send them to its final destination.

For your raw data, it does make sense to separate the sources into different schemas.
You can also separate your final destination into different schemas too.
One for Finance, one for Product, one for Marketing. This is all you can grant at the schema level for each role. Much easier to manage for permissions.

Check out getdbt.com

Djalma Gomes, Pmp, Mba Managing Partner at Data Pine · Answer 4 · 2021-09-14T13:20:16Z

The main reason is security and governance.

Most of the time, you are required to perform different actions on the data. Cleansing and adapting it to naming standards is pretty common and this could happen in different steps.

Having different schemas helps to prevent unwanted mistakes.

Joe Fernandes Asst. Senior IT Manager at a retailer with 5,001-10,000 employees · Answer 5 · 2021-09-13T10:52:53Z

From a business perspective, it is recommended to extract data from the source system only once. In large organisations, there may be several fields in various tables which will not be required for reporting immediately. So while all data is extracted from the source system to the Persistent Storage Area in the target system, one could store this data in a staging layer in the target system for current / future use. The staging layer would typically contain all data extracted or could be filtered and transformed as required. The Persistent storage area is typically cleared within 15 days.The Staging layer in the target system also serves as a backup of all data in case your source system is down.

The advantage in building another layer above the staging layer will be to transform data further and load it in a form which will make sense for business. An example would be where one creates a value field as quantity × rate or any other formula such as computation of a discount from other fields. Only fields required for reporting could be stored in this permanent layer.

Further layers for data targets as required could be created depending on Performance and reporting considerations and the Reporting tool one uses.

The above multi layer architecture described was for older installations.

Newer setups have features which allow one to construct a view based on a single/multiple tables and report directly.

RajneeshShukla Solution Architect at a tech vendor with 10,001+ employees · Answer 6 · 2020-05-20T15:37:34Z

RS

RajneeshShukla

Solution Architect at a tech vendor with 10,001+ employees

Real User

May 20, 2020

Thank you Gouri !!!

What are the benefits of having separate layers or a dedicated schema for each layer in ETL?

6 Answers

Related categories

Related Q&As

Related articles

Cloud Data Warehouse experts