What is the difference between Azure Data lake and BLOB storage?
These two Microsoft solutions are compatible within the AZURE centralized data repository framework. The data lake is a strategy by which all forms of data can be 'housed' at a single location. BLOB is a field type to accomodate the storage of differing data types; generically known as binary storage. They work together under MS AZURE's Data Center platform.
We are finding a useful and less costly alternative to centralized data storage is adding a middle layer, known generically as DATA VIRTUALIZATION. It has many of the same benefits, but requires less compromise than centralized data storage solutions. The value proposition is complex to articulate outside a specific implementation ... but happy to talk you through it.
Data Lake is a data warehousing solution be it in Azure cloud or it can be AWS as a DataLake.
They are stored as file formats and are retrieved either through coding or scripts or we have a lot of ETL tools in place today to interpret the data from Data Lakes. Microsoft by itself provides Data Factory to interpret data from Data Lakes.
BLOB is just as similar as S3 concept in AWS. They are file system that is the cloud where you could store any file format and store and retrieve and use it, transform so on and forth.
In summary Azure Blob Storage is a general purpose, scalable object store created for a variety of storage scenarios; whereas Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads. I recently happened across this page: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage. I believe it provides the best answer for this question.
Instead of now putting data in DB or data dimension warehouse, best way is to put data in in-memory computing grid-like ignite for in-memory fast computing power. Data Lake still is complex.
Albeit, practitioners still use any datalake for storage of raw data across enterprise, still it's complicated and nonneutral. Any storage mechanisms for that matter be it a cloud datalake is inefficient as compared to in memory computing power like gridgain ignite. Processing computing in memory speeds is highly recommended practice today for mpp scales, be it for bigdata analytics etc. Ignite provides both options of in memory and native persistence.
So bye bye to datalake.. Persey and bounding to one cloudprovider datalake...
So you can in fact place ignite clusters on top of your lakes and create a truely mpp scales digital platforms for future.
Blob is General purpose object storage while Azure data lake is primarily meant for big data analytics i.e HDFS kind of storage mechanism.
Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads
Please see detailed article on this here
Data lake can hold vast pools of raw data at optimal price. One way to imagine a data lake is to compare it with the natural lake which stores all the water from different sources in its raw form.
For any enterprise product there will be more than one application, Which results in a different database, streaming service, file system that serves for a single business solution. In this case it is very difficult to eliminate old applications due to some high level decisions, cost and training employees to use new applications.
In other cases there has to be segregation of business solutions into multiple applications so that some applications can be scaled, designed and secured according to the need, this solution can be more flexible and maintainable.
In both the cases we will end up with multiple applications and multiple databases for operating complete business, optimised at each operation of business. In both the cases generating insight/report from different applications is a cumbersome process.
Data lake comes into rescue. We can move the data from different applications to the data lake. Once data is moved to data lake it is easy to run operation queries, data analytic, data exploration across different applications. This will help the business to gather information/reporting capabilities in an effective way. Also data lake are designed to handle huge velocity and huge volume of data optimal price when compared to database and streaming systems.
Azure Data lake is built on top of BLOB storage (Similar to S3) to store the real data as objects. Azure Data lake is a service which manages the underlying data in BLOB storage. Example: querying objects in the data lake from other components. like Azure Machine learning