We just raised a $30M Series A: Read our story
2021-10-09T05:32:00Z

What are the main differences between Data Lake and Data Warehouse?

47

What are the relations between them? What are their use cases?

ITCS user
Guest
66 Answers

author avatar
User

Many of the comparisons of data lake and data warehouse that you see (such as the one below from Talend) are based on an out-of-date or dumbed-down idea of the data warehouse.  


The more advanced data warehouse engines:


- support a wide range of data types and formats


- can access external data (e.g., in object storage) that has never been ingested


- support data scientists as well as business users (e.g., with an ability to run Python, R, SAS routines and data science libraries on data in place in parallel in the data warehouse)


- support operational query on live, rapidly changing data


While also providing capabilities and services never provided on data lakes or their cloud-based equivalents.  Data warehouses, properly operated and housing data that is properly curated, are much more efficient, cost-effective and performant for data that is intensively shared and widely used.


Data lakes are good repositories for data that is more lightly or locally used and does not merit the level of curation usually desired in a data warehouse.

2021-10-11T16:51:09Z
author avatar
Top 5Vendor

Those are just terms used by marketing. 30 years ago, BI required a pre-aggregated database designed as cubes.  Dimensions and hierarchies were pre-defined and we used to call those cubes data warehouses. An example of this old-school BI is SAP BW. This was required to provide speed.  Along the time, the term data warehouse started to be used by vendors of transactional systems. So, you could hear terms like Tax data warehouse (for packages that deal with tax info) and Client data warehouse (for CRMs and packages that deal with customer info).


We don´t need cubes to provide speed anymore. Data Lake is a new term related to big data that no longer requires rigid structures as cubes to provide speed. We now divide data lakes into HOT, WARM and COLD depending on the information availability needs.  In the old days, data warehouse meant structured data but nowadays, you may have non-structured data as well and use the Hadoop ecosystem to store them with low cost and security and that is all part of the data lake.



Most companies use only structured data and they call the single repository of data as data lakes after data is treated, cleaned and shaped into standards (like field and table naming standards).  Some others have layers of data lakes (RAW data, Cleansed Data, etc.....).   Having different layers is very useful for data governance since the volume and the speed of change affects the company´s policies.


All the best.

2021-11-19T11:55:01Z
author avatar
Top 5LeaderboardReal User

I assume the "Data Warehouse" term here is in the traditional/conventional meaning. 


In such a case, the main differences are: 


1) the traditional data warehouse only hosts structured or semi-structured data, while Data Lake hosts all kinds of data including unstructured data like video and audio.


2) Data Warehouse is a "House" meaning it is pre-built then to housing the fit-the-purpose data, while Data Lake is after data gets in, the build "house or camp or tent" to fit the "on-fly" demand, in such a way the Data Lake's "house" could be scaled unlimited (theoretically). 


3) the Data Warehouse is ETL populated, while Data Lake is ELT fulfilled.

2021-10-12T02:17:19Z
author avatar
Top 5LeaderboardReal User

The relationship between them is depending on how one implements the Data Lake. 


The Data Lake can include the Data Warehouse in it, or the Data Warehouse can be a supplement to the data lake to meet the traditional reporting and ad hoc analysis needs. An example is the Databricks' Lakehouse implementation.

2021-10-12T02:23:17Z
author avatar
User

In the answer below, I meant to add: there are some types of data where it does not add a lot of value to put the data in the data warehouse. An example would be a large audio or video file.  


In this case, it makes sense to store the raw data objects in a data lake or similar platform and, if it would be valuable to integrate with other data, store the extracted features in the data warehouse. 

2021-10-11T18:51:03Z
author avatar
ExpertModeratorReal User

Hi @Evgeny Belenky ​ - great question. 


Here is the best answer crafted by Talend 




























Data Lake

Data Warehouse

Data Structure

Raw

Processed

Purpose of Data

Not Yet Determined

Currently In Use

Users

Data Scientists

Business Professionals

Accessibility

Highly accessible and quick to update

More complicated and costly to make change

Please read more here https://www.talend.com/resourc...

2021-10-10T07:57:12Z
Find out what your peers are saying about Snowflake Computing, Oracle, Micro Focus and others in Data Warehouse. Updated: November 2021.
552,695 professionals have used our research since 2012.