Databricks Valuable Features

JH
Solution Architect at a insurance company with 10,001+ employees

There are good features for turning off clusters. Basically, if we aren't using it, then it is turned off. When a user starts accessing, it starts up so we save on computing. 

Our data lake team likes the interface very much because it is straightforward. Of, course you need to understand the different clusters when they are started. 

There are nice features for matching the learning and analytics. 

The security features allow us to integrate with the active directory and assign different people to different databases. 

The solution has good a good interface with Python. 

There is good integration with Azure so we can access the solution over the standard Azure interface and use the storage pro measure. 

View full review »
SS
Business Architect at YASH Technologies

The solution is an impressive tool for data migration and integration. 

The run time is very quick.

View full review »
AbhishekGupta - PeerSpot reviewer
Engineering Leader at Walmart

The solution's features are fantastic and include interactive clusters that perform at top speed when compared to other solutions.

The ATC monitoring experience and the maturity of the APIs are very good. 

View full review »
Buyer's Guide
Databricks
April 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,415 professionals have used our research since 2012.
Sudhendra Umarji - PeerSpot reviewer
Technical Architect at Infosys

The ability to stream data and the windowing feature are valuable. There are a number of targeted integration points, so that is a difference between Stream Analytics and Databricks. The integrations input or output are better in Databricks. It's accessible to use any of the Python or even Java. I can use the third party, deploy it, and use it.

View full review »
Nabil Fegaiere1 - PeerSpot reviewer
Chief Executive Officer at dotFIT, LLC

It's very simple to use Databricks Apache Spark. It's really good for parallel execution to scale up the workload. In this context, the usage is more about virtual machines.

Using meta-stores like Hive was optional, and the solution is good for data science use cases. With the Authenticator Log, Databricks is good for data transformation and BI usage. We have a platform.

View full review »
Karan  Sharma - PeerSpot reviewer
Data Analyst at Allianz

The most valuable feature of the solution stems from the fact that it is quite fast, especially regarding features like its computation and atomicity parts of reading data on any solution. We have a storage account, and we can read the data on the go and use that since we now have the unity catalog in Databricks, which is quite good for giving you an insight into the metadata of the data you're going to process. There are a lot of things that are quite nice with Databricks.

View full review »
Avadhut Sawant - PeerSpot reviewer
Consulting Architect at a computer software company with 10,001+ employees

A very valuable feature is the data processing, and the solution is specifically good at using the Spark ecosystem.

View full review »
Axel Richier - PeerSpot reviewer
Tech Lead Consultant | Manager Data Engineering at Ekimetrics

The shared experience of collaborative notebooks is probably the most useful aspect since, as an expert, it allows me to help my juniors debug their books and their code live. I can do some live coding with them or help them find the errors very efficiently.

It's simple to set up.

I love Databricks due to the fact that we can deploy it in 15 minutes and it's ready to use. That's very nice.

The solution is stable. 

We can scale the product.

View full review »
Alex Tsui - PeerSpot reviewer
Sr. Director at Omnicell

The simplicity of development is the most valuable feature. 

View full review »
RichardXu - PeerSpot reviewer
Data Science Lead at a mining and metals company with 10,001+ employees

The scalability brings value to this solution.

It can send out large data amounts.

View full review »
Sahil Taneja - PeerSpot reviewer
Principal Consultant/Manager at Tenzing

The processing capacity is tremendous in the database. We are dealing with Azure as storage, so we have not faced any challenges. And also the connectors to different data sources. Moreover, it is not a language-dependent tool. Therefore, development also takes place faster. It is one of the best features of Databricks.

View full review »
AO
Lead Data Scientist at a manufacturing company with 10,001+ employees

We have the ability to scale, collaborate and do machine learning.

View full review »
Anand Sharma - PeerSpot reviewer
Sr Data Engineer at PIMCO

The most valuable feature is the versatility of the ecosystem. You can write code in SQL, Python, or Java.

The load distribution capabilities are good, and you can perform data processing tasks very quickly.

You can save and share notebooks between different teams.

The interface is easy to use.

View full review »
Rupal Sharma - PeerSpot reviewer
Data Architect at Three Ireland (Hutchison) - Infrastructure

Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours.

So that's why it's quite convenient to use for data science, for training machine learning models. By using more computing power, you can make it even faster.

View full review »
Kevin McAllister - PeerSpot reviewer
Executive Manager at Hexagon AB

Databricks' most valuable feature is the data transformation through PySpark.

View full review »
PankajKumar13 - PeerSpot reviewer
Computer Scientist at Adobe

The features I found most helpful with Databricks are the Lakehouse and SQL environments.

View full review »
Shiva Prasad ELLUR - PeerSpot reviewer
Vice President - Data Engineering and Analytics at a financial services firm with 10,001+ employees

We like that this solution can handle a wide variety and velocity of data engineering, either in batch mode or real-time.

This product allows us to write the email models in a way that allows us to take the advantage of the parallel scaling computer window backend on any of the satellite services.

View full review »
RC
Sr. BigData Architect at ITC Infotech

The elasticity of the solution is excellent.

The storage, etc., can be scaled up quite easily when we need it to.

It's easy to increase performance as required.

The solution runs on Spark very well.

View full review »
SA
Principal at a computer software company with 5,001-10,000 employees

From a data science and applied analytics perspective, what I like about Databricks is that it's probably one of the most popular platforms that give access to folks who are trying not just to do exploratory work on the data but also go ahead and build advanced modeling and machine learning on top of that, and then go ahead and make that available for dissemination of insights. For example, you can save all data and build out endpoints, so business analysts and users can access that data through a dashboard.

During the process, I also like that Databricks allows you to do portion control to keep track of your operations on the data and maintain that lineage to create reproducible results. 

The most significant Databricks advantage is that you can do everything within the platform. You don't need to exit the platform because it's a one-stop shop that can help you do all processes.

The solution is top-notch from a data science, applied ML, or advanced analytics perspective.

View full review »
PraveenS - PeerSpot reviewer
Design Engineer at Cyient Limited

We extensively use the product’s notebooks, jobs, and triggers. We can create activities. Wherever translation is required, we use Databricks. The product fulfills our customer requirements. It is a cost-effective solution.

View full review »
Jeremy Salt - PeerSpot reviewer
Sr. Data Quality Analyst at Seek

Databricks makes it really easy to use a number of technologies to do data analysis. In terms of languages, we can use Scala, Python, and SQL. Databricks enables you to run very large queries, at a massive scale, within really good timeframes.

I'm starting to build a solution using Delta Live Tables and Delta Live pipelines, and it is proving to be exceptionally easy to use. I have also been able to quickly implement a pipeline.

View full review »
AB
STI Data Leader at grupo gtd

I like the simplicity and ease of use. 

You can deploy the solution to many clouds easily. 

The initial setup is straightforward.

The solution offers a free community version.

View full review »
Elizabeth Ho - PeerSpot reviewer
Manager, Customer Journey at a retailer with 10,001+ employees

Databricks lets you schedule jobs pretty easily, and you can use SQL, Spark SQL, Python, or R. It also allows you to save a table or view. 

I like that you can connect to multiple data sources. Most of our data is stored in the Azure data lake, but my previous company connected to SQL databases or even blob storage. 

They've improved on many features. I don't do data engineering, but I had an issue a couple of years ago at my two companies ago. It took a long time to read and save tables, but I think the new Delta feature helped. 

I like how easy it is to share your notebook with others. You can give people permission to read or edit. I think that's a great feature. You can also pull in code from GitHub pretty easily. I didn't use it that often, but I think that's a cool feature.

View full review »
RM
Head of Business Integration and Architecture at Jakala

The Delta Lake data type has been the most useful part of this solution. Delta Lake is an opensource data type and it was implemented and invented by Databricks. It is the most important element of the solution. Databricks also offers exceptional performance and scalability. 

View full review »
Jithin James - PeerSpot reviewer
Financial Analyst 4 (Supply Chain & Financial Analytics) at Juniper Networks

Databricks is hosted on the cloud. It is very easy to collaborate with other team members who are working on it. It is production-ready code, and scheduling the jobs is easy.

View full review »
GR
Head of Referential and Big Data at a financial services firm with 5,001-10,000 employees

I like cloud scalability and data access for any type of user.

View full review »
JH
Head of Credit Risk and Data at Cegid Invoice and Financing

Databricks gives us the ability to build a lakehouse framework and do everything implicit to this type of database structure. We also like the ability to stream events. Databricks covers a broad spectrum, from reporting and machine learning to streaming events. It's important for us to have all these features in one platform.

View full review »
RC
Data Engineering Manager at a pharma/biotech company with 10,001+ employees

The most valuable feature is the Spark cluster which is very fast for heavy loads, big data processing and Pi Spark.

View full review »
Tajinder_Singh - PeerSpot reviewer
Senior Software Engineer at a computer software company with 201-500 employees

The most valuable features are data engineering and data science because we can create Notebooks on them. We can use any Python library to build data science models, or we can use libraries like Seaborn or Matplotlib to create charts based on data for data analysis. It is a really valuable capability.

View full review »
MahalaxmanraoChappedi - PeerSpot reviewer
Associate Principal - Data Engineering at a tech services company with 10,001+ employees

I like that Databricks is a unified platform that lets you do streaming and batch processing in the same place. You can do analytics, too. They have added something called Databricks SQL Analytics, allowing users to connect to the data lake to perform analytics. Databricks also will enable you to share your data securely. It integrates with your reporting system as well.

The Unity Catalog provides you with the data links and material capabilities. These are some of the unique features that fulfill all the requirements of the banking domain.

View full review »
Oscar Estorach - PeerSpot reviewer
Chief Data-strategist and Director at Theworkshop.es

The solution is very easy to use. 

The storage on offer is very good. 

The solution is perfect for dealing with big data.

The artificial intelligence on offer is very good.

The product is quite flexible.

We have found the solution to be stable. 

The cloud services on offer are very reasonably priced.

Technical support is very good. They also have very good documentation on offer to help you navigate the product and learn about its offerings. 

View full review »
MA
Senior Data Engineer at TCS

Databricks is a unified solution that we can use for streaming. It is supporting open source languages, which are cloud-agnostic. When I do database coding if any other tool has a similar language pack to Excel or SQL, I can use the same knowledge, limiting the need to learn new things. It supports a lot of Python libraries where I can use some very easily.

View full review »
Sanjay Bheemasenarao - PeerSpot reviewer
Director - Data Engineering expert at Sankir Technologies

Databricks has a scalable Spark cluster creation process. The creators of Databricks are also the creators of Spark, and they are the industry leaders in terms of performance.

Databricks has made great strides in terms of performance. 

It is very user friendly. I like the ease of creating a Spark cluster, submitting a job, or creating a notebook.

The UI has also changed for the better compared to what it was two years ago.

View full review »
MILTON FERREIRA - PeerSpot reviewer
Co-founder/Senior Data Scientist at Hence

The most valuable feature of Databricks is the integration of the data warehouse and data lake, and the development of the lake house. Additionally, it integrates well with Spark for processing data in production. 

View full review »
Olubisi Akintunde - PeerSpot reviewer
Team Lead at a tech services company with 1,001-5,000 employees

The flexibility of Databricks is the most valuable feature. It gives us the ability to write analytics code in multiple languages.

There is a single workspace for different data roles like data engineers, machine learning engineers, and the end user, who can connect to the same system. 

Databricks computes separate from storage, so you are not coupled with the underlying data sets, allowing for multiple processes and multiple programs to be written on the same code.

View full review »
IshwarSukheja - PeerSpot reviewer
Sr Manager Data Scientist at Bizmetric

The solution is built from Spark and has integration with MLflow, which is important for our use case. 

Databricks is also user-friendly, providing customizable codes and models that allow people to experiment quickly. 

Integration of Delta Lake is another useful feature.

View full review »
AJ
Lead Analytics at a manufacturing company with 10,001+ employees

In the manufacturing industry, Databricks can be beneficial to use because of machine learning. It is useful for tasks, such as product analysis or predictive maintenance.

View full review »
JK
Lead Architect at Birlasoft IndiaLtd.

This solution offers a lake house data concept that we have found exciting. We are able to have a large amount of data in a data lake and can manage all relational activities. All asset complaints properties are available and this is very useful to ensure the quality of all data.

View full review »
Jorge Alvarado - PeerSpot reviewer
Owner at a marketing services firm with 1-10 employees

Databricks' Lakehouse architecture has been most useful for us. The data governance has been absolutely efficient in between other kinds of solutions.

View full review »
Anirban Bhattacharya - PeerSpot reviewer
Practice Head, Data & Analytics at a tech vendor with 10,001+ employees

Databricks can cut across the entire ecosystem of open source technology which gives an extra level in terms of getting the transformatory process of the data. The solution is primarily open source and they have bolstered its components to make it more fit for purpose for a complete Azure Data platform. The solution is responsible for the core transformatory activities. While Azure Data Factory is very good for pulling in the data, doing the basic standardization and profiling, Databricks is more about making fundamental changes in structure or in size of the data and aligning it for subsequent consumption, or for the final layer on Synapse. It also has the power to complement and work with Spark and elements related to Python. 

View full review »
Tristan Bergh - PeerSpot reviewer
Data Scientist at a computer software company with 501-1,000 employees

Immense ease in running very large scale analytics, with a convenient and slick UI. This saved us from having to tweak, tune, dive into deeper abstractions, get involved in procurement, and also having to wait for other workloads to run.

The built-in optimization recommendations halved the speed of queries and allowed us to reach decision points and deliver insights very quickly. 

The Delta data format proved excellent. Databricks had already done the heavy lifting and optimized the format for large scale interactive querying. They saved us a lot of time.

View full review »
Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC

The most valuable Databricks feature for us is that it does not require us to configure clusters. It automatically configures the clusters to the right size, the right number of clusters, the right number of nodes per cluster, et cetera.

View full review »
Sarbani Maiti - PeerSpot reviewer
Vice President at a tech services company with 51-200 employees

Databricks is quite easy to use and requires less coding and customizations than a solution like AWS SageMaker which I'd previously used on a lot of projects. Databricks enables more people to efficiently build and host their ML code. Another great aspect is that MLflow is already integrated with Databricks which makes a big difference. It enables us to track and monitor all our different experiments. We have mostly used the MLflow part and generic notebooks with the ML building machine learning model, as well as using Pytorch for some of our medical imaging. We were able to quickly deploy both these features without requiring anything extra. 

View full review »
PD
Enterprise Data Architect at a financial services firm with 51-200 employees

Its lightweight and fast processing are valuable.

View full review »
KG
Associate Manager at a consultancy with 501-1,000 employees

The main features of the solution are efficiency.

We were trying to process 300 million records over 10 years. If you are processing that high number of records through the ADF pipeline with, for example, Azure, it took approximately six hours. In order to reduce the burden on our ADF pipeline, we wrote a simple code in this solution where we can read and write to the file into the temporary Storage Explorer. By going through this solution, we were able to complete the processing of the data in half an hour.

The technology that allows us to write scripts within the solution is extremely beneficial. If I was, for example, able to script in SQL, R, Scala, Apache Spark, or Python, I would be able to use my knowledge to make a script in this solution. It is very user-friendly and you can also process the records and validation point of view.

The ability to migrate from one environment to another is useful.

View full review »
HA
Cloud Administrator at a retailer with 5,001-10,000 employees

The solution is very simple and stable.

View full review »
Diego Henrique Da Silva Bastos - PeerSpot reviewer
Data Engineer Analyst at Metyis

The most valuable feature of Databricks is the notebook, data factory, and ease of use.

View full review »
AM
Global Data Architecture and Data Science Director at FH

Databricks gives you the flexibility of using several programming languages independently or in combination to build models.

The quick visualization of the data is very good.

The workload management functionality works well.

View full review »
MM
Lead Data Architect at a government with 1,001-5,000 employees

The Databricks notebooks with SQL and Python provide good intuitive development environment. The Delta Lake, the reading of underlying file storage, the delta tables mounted on top of data lake (AWS in our case) are providing full ACID compliance, good connectivity and interoperability.  

The initial setup is fairly straightforward. The stability is good.

View full review »
YK
Pre-sale Leader, Big Data Enterprise Solutions at Ness Technologies

The most valuable feature is the ability to use SQL directly with Databricks. That is the most relevant thing for my current project.

After deployment, it is easy to load files and query data.

View full review »
OB
Cloud & Infra Security, Group Manager at a tech vendor with 10,001+ employees

Databricks helps crunch petabytes of data in a very short period of time for data scientists or business analysts. It helps with fraud analysis, finance, projections, etc. I like it.

This is exactly the purpose of big data and analytics. It provides the mechanism to process and analyze a stream of information. It's best for share analysis and stream analysis.

View full review »
OB
IT Manager: User Support at a financial services firm with 10,001+ employees

I think what I value is more about the technology itself because you don't need to have too much knowledge to be able to use the solution. 

View full review »
it_user1050483 - PeerSpot reviewer
CEO at Inosense

Valuable features would have to include the Notebook for piping some models and the future of executing the notebooks in parallel, in batches, which is also something that we use. And we use the Notebook on Spark with Python. 

View full review »
RD
Data Scientist at a retailer with 5,001-10,000 employees

One of the features provides nice interactive clusters, or compute instances that you don't really need to manage often. You can just spin it off and use that for a lot of your pre-processing, which is very convenient. 

The normal features are very good in terms of doing some quick development or doing some EDA.

Also, one of the newest features brought into this solution provides you with a way to solve, deploy, and train models using the platform itself. Or, it can connect to your Azure Machine Learning in order to train, deploy, and productionalize some of the machine learning models.

View full review »
VP
Data Scientist at a energy/utilities company with 10,001+ employees

Of the available feature set, I like the Imageflow feature a lot. It is very interesting. It gives me clarity on the execution of a process. I can draw the complete flow from start to finish in the exact way that I want it to execute. It is more visual and it is also easier for the people in businesses where I make presentations to understand.  

When I demonstrate a process to a business and show them the approach I am taking using code and technical language, then of course not many are going to understand that. But when I show them the process in terms of the graphical layout Imageflow helps provide, then they will be able to understand it much easier. They understand why I am choosing a particular way of executing the process and why I am taking certain steps in the way I have chosen to do it. The point is to help other people understand the solution more clearly.  

View full review »
Mullai Selvan - PeerSpot reviewer
Project Manager at MAQ Software

The most valuable feature of Databricks is the integration with Microsoft Azure.

View full review »
Natalia  Raffo - PeerSpot reviewer
Co - Founder & Chief Data Officer -CDO at Data360

Databricks allows me to automate the creation of a cluster, optimized for machine learning and construct AI machine learning models for the client.

View full review »
RB
Business Intelligence Coordinator Latam at a construction company with 5,001-10,000 employees

The capacity of use of the different types of coding is valuable. Databricks also has good performance because it is running in spark extra storage, meaning the performance and the capacity use different kinds of codes.

View full review »
AP
Chief Research Officer at a consumer goods company with 1,001-5,000 employees

I think the features I like the most are the scalability of the solution as well as its ability to share. We work with multiple people on notebooks and it enables us to work collaboratively in an easy way without having to worry about the infrastructure. I think the solution is very intuitive, very easy to use. And that's what you pay for.

View full review »
PG
Data Science Developer at a tech services company with 501-1,000 employees

Databricks is based on a Spark cluster and it is fast. Performance-wise, it is great.

This solution has very good machine learning libraries built-in.

The support for big data is good.

View full review »
LV
Advanced Analytics Lead at a pharma/biotech company with 1,001-5,000 employees

The solution is easy to use and has a quick start-up time due to being on the cloud.

View full review »
AD
Business Intelligence and Analytics Consultant at a tech services company with 201-500 employees

The most valuable feature is the ability to switch loads between multiple clusters.

Automation with Databricks is very easy when using the API.

The ability to write code and SQL in the same interface is useful.

It is easy to connect notebooks to a cluster.

There are a large number of inbuilt functions that help to make things easier.

View full review »
BG
Data Architect at a tech services company with 201-500 employees

The fast data loading process and data storage capabilities are great.

Based on the data loads and the performance, you can easily scale up the clusters.

View full review »
NH
Director of Data (Engineering & Science) at a tech services company with 11-50 employees

The ease of use and its accessibility are valuable.

View full review »
RP
Big Data and Cloud Architect at a computer software company with 201-500 employees

Databricks' most valuable features are the workspace and notebooks. Its integration, interface, and documentation are also good.

View full review »
SN
Head of Data & Analytics at a tech services company with 11-50 employees

You can spin up an Azure Databricks clustered, and integrating with it is seamless.

The integration with Python and the notebooks really helps.

View full review »
SV
Engineer at a tech services company with 10,001+ employees

The time travel feature is the solution's most valuable aspect.

View full review »
SH
Data Science Consultant at Syniti

I found that PySpark is the most useful tool. It uses in-memory calculation and when you want to run a model it does it very quickly. We used to use Python and when we migrated to PySpark the performance was much better.

View full review »
DW
Machine Learning Engineer at a tech vendor with 51-200 employees

The most valuable aspect of the solution is its notebook. It's quite convenient to use, both terms of the research and the development and also the final deployment, I can just declare the spark jobs by the load tables. It's quite convenient.

View full review »
SC
Chief Data Scientist at a tech services company with 11-50 employees

 Databricks integrates well with other solutions.

View full review »
AA
Technical Architect at a tech services company with 10,001+ employees

I like the ability to use workspaces with other colleagues because you can work together even without seeing the other team's job. So you can create a robust solution by working together with other professionals.

View full review »
HL
Business Development Specialist at a tech services company with 51-200 employees

Databricks covers end-to-end data analytics workflow in one platform, this is the best feature of the solution.

View full review »
Buyer's Guide
Databricks
April 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,415 professionals have used our research since 2012.