2020-05-25T12:29:00Z

What is data lake storage?

Rony_Sklar - PeerSpot reviewer
  • 7
  • 89
PeerSpot user
7

7 Answers

KM
User
2020-05-27T16:03:25Z
May 27, 2020

These two Microsoft solutions are compatible within the AZURE centralized data repository framework. The data lake is a strategy by which all forms of data can be 'housed' at a single location. BLOB is a field type to accomodate the storage of differing data types; generically known as binary storage. They work together under MS AZURE's Data Center platform.

We are finding a useful and less costly alternative to centralized data storage is adding a middle layer, known generically as DATA VIRTUALIZATION. It has many of the same benefits, but requires less compromise than centralized data storage solutions. The value proposition is complex to articulate outside a specific implementation ... but happy to talk you through it.

Search for a product comparison in Cloud Data Integration
KK
Vendor
2020-05-27T02:45:02Z
May 27, 2020

Data Lake is a data warehousing solution be it in Azure cloud or it can be AWS as a DataLake.

They are stored as file formats and are retrieved either through coding or scripts or we have a lot of ETL tools in place today to interpret the data from Data Lakes. Microsoft by itself provides Data Factory to interpret data from Data Lakes.

BLOB is just as similar as S3 concept in AWS. They are file system that is the cloud where you could store any file format and store and retrieve and use it, transform so on and forth.

TJ
User
2020-05-26T19:28:27Z
May 26, 2020

In summary Azure Blob Storage is a general purpose, scalable object store created for a variety of storage scenarios; whereas Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads. I recently happened across this page: https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage. I believe it provides the best answer for this question.

SS
Real User
2020-05-26T14:10:45Z
May 26, 2020

Instead of now putting data in DB or data dimension warehouse, best way is to put data in in-memory computing grid-like ignite for in-memory fast computing power. Data Lake still is complex.

SS
Real User
2020-05-27T14:11:02Z
May 27, 2020

Albeit, practitioners still use any datalake for storage of raw data across enterprise, still it's complicated and nonneutral. Any storage mechanisms for that matter be it a cloud datalake is inefficient as compared to in memory computing power like gridgain ignite. Processing computing in memory speeds is highly recommended practice today for mpp scales, be it for bigdata analytics etc. Ignite provides both options of in memory and native persistence.

So bye bye to datalake.. Persey and bounding to one cloudprovider datalake...
So you can in fact place ignite clusters on top of your lakes and create a truely mpp scales digital platforms for future.

KP
Consultant
2020-05-27T10:20:06Z
May 27, 2020

Blob is General purpose object storage while Azure data lake is primarily meant for big data analytics i.e HDFS kind of storage mechanism.

Azure Blob Storage is a general purpose, scalable object store that is designed for a wide variety of storage scenarios. Azure Data Lake Storage Gen1 is a hyper-scale repository that is optimized for big data analytics workloads

Please see detailed article on this here
https://docs.microsoft.com/en-us/azure/data-lake-store/data-lake-store-comparison-with-blob-storage

Find out what your peers are saying about Amazon, MuleSoft, Matillion and others in Cloud Data Integration. Updated: March 2024.
765,234 professionals have used our research since 2012.
GK
Real User
2020-05-27T08:05:49Z
May 27, 2020

Data lake can hold vast pools of raw data at optimal price. One way to imagine a data lake is to compare it with the natural lake which stores all the water from different sources in its raw form.

For any enterprise product there will be more than one application, Which results in a different database, streaming service, file system that serves for a single business solution. In this case it is very difficult to eliminate old applications due to some high level decisions, cost and training employees to use new applications.

In other cases there has to be segregation of business solutions into multiple applications so that some applications can be scaled, designed and secured according to the need, this solution can be more flexible and maintainable.

In both the cases we will end up with multiple applications and multiple databases for operating complete business, optimised at each operation of business. In both the cases generating insight/report from different applications is a cumbersome process.

Data lake comes into rescue. We can move the data from different applications to the data lake. Once data is moved to data lake it is easy to run operation queries, data analytic, data exploration across different applications. This will help the business to gather information/reporting capabilities in an effective way. Also data lake are designed to handle huge velocity and huge volume of data optimal price when compared to database and streaming systems.

Azure Data lake is built on top of BLOB storage (Similar to S3) to store the real data as objects. Azure Data lake is a service which manages the underlying data in BLOB storage. Example: querying objects in the data lake from other components. like Azure Machine learning

Cloud Data Integration
What is cloud data integration? Cloud data integration refers to the process of integrating data used by disparate application programs between public or private clouds, or between on-premises and cloud-based systems.
Download Cloud Data Integration ReportRead more

Related articles

Cloud Data Integration experts

Joaquin Marques - PeerSpot reviewer
Atif Tariq - PeerSpot reviewer
Kishor Lamdande - PeerSpot reviewer
Rohit Sircar - PeerSpot reviewer
Syed Zakaulla - PeerSpot reviewer
Anshuman Kishore - PeerSpot reviewer
Subhadip Pakrashi - PeerSpot reviewer
Poulav Biswas - PeerSpot reviewer