We just raised a $30M Series A: Read our story

What is data lake storage?


What is the difference between Azure Data lake and BLOB storage?

ITCS user
77 Answers

author avatar

These two Microsoft solutions are compatible within the AZURE centralized data repository framework. The data lake is a strategy by which all forms of data can be 'housed' at a single location. BLOB is a field type to accomodate the storage of differing data types; generically known as binary storage. They work together under MS AZURE's Data Center platform.

We are finding a useful and less costly alternative to centralized data storage is adding a middle layer, known generically as DATA VIRTUALIZATION. It has many of the same benefits, but requires less compromise than centralized data storage solutions. The value proposition is complex to articulate outside a specific implementation ... but happy to talk you through it.

author avatar
Real User

Data Lake is a data warehousing solution be it in Azure cloud or it can be AWS as a DataLake.

They are stored as file formats and are retrieved either through coding or scripts or we have a lot of ETL tools in place today to interpret the data from Data Lakes. Microsoft by itself provides Data Factory to interpret data from Data Lakes.

BLOB is just as similar as S3 concept in AWS. They are file system that is the cloud where you could store any file format and store and retrieve and use it, transform so on and forth.

author avatar

In summary Azure Blob Storage is a general purpose, scalable object store created for a variety of storage scenarios; whereas Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads. I recently happened across this page: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage. I believe it provides the best answer for this question.

author avatar
Real User

Instead of now putting data in DB or data dimension warehouse, best way is to put data in in-memory computing grid-like ignite for in-memory fast computing power. Data Lake still is complex.

author avatar
Real User

Albeit, practitioners still use any datalake for storage of raw data across enterprise, still it's complicated and nonneutral. Any storage mechanisms for that matter be it a cloud datalake is inefficient as compared to in memory computing power like gridgain ignite. Processing computing in memory speeds is highly recommended practice today for mpp scales, be it for bigdata analytics etc. Ignite provides both options of in memory and native persistence.

So bye bye to datalake.. Persey and bounding to one cloudprovider datalake...
So you can in fact place ignite clusters on top of your lakes and create a truely mpp scales digital platforms for future.

author avatar

Blob is General purpose object storage while Azure data lake is primarily meant for big data analytics i.e HDFS kind of storage mechanism.

Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads

Please see detailed article on this here

author avatar
Top 10Real User

Data lake can hold vast pools of raw data at optimal price. One way to imagine a data lake is to compare it with the natural lake which stores all the water from different sources in its raw form.

For any enterprise product there will be more than one application, Which results in a different database, streaming service, file system that serves for a single business solution. In this case it is very difficult to eliminate old applications due to some high level decisions, cost and training employees to use new applications.

In other cases there has to be segregation of business solutions into multiple applications so that some applications can be scaled, designed and secured according to the need, this solution can be more flexible and maintainable.

In both the cases we will end up with multiple applications and multiple databases for operating complete business, optimised at each operation of business. In both the cases generating insight/report from different applications is a cumbersome process.

Data lake comes into rescue. We can move the data from different applications to the data lake. Once data is moved to data lake it is easy to run operation queries, data analytic, data exploration across different applications. This will help the business to gather information/reporting capabilities in an effective way. Also data lake are designed to handle huge velocity and huge volume of data optimal price when compared to database and streaming systems.

Azure Data lake is built on top of BLOB storage (Similar to S3) to store the real data as objects. Azure Data lake is a service which manages the underlying data in BLOB storage. Example: querying objects in the data lake from other components. like Azure Machine learning

Find out what your peers are saying about MuleSoft, Informatica, Denodo and others in Cloud Data Integration. Updated: October 2021.
542,608 professionals have used our research since 2012.