Alex Tsui - PeerSpot reviewer
Sr. Director at Omnicell
Real User
A stable, scalable solution that simplifies the development process but needs more debuggers and components
Pros and Cons
  • "The simplicity of development is the most valuable feature."
  • "Databricks has a lack of debuggers, and it would be good to see more components."

What is our primary use case?

We use the solution for data engineering. 

How has it helped my organization?

The tool helps us manage large amounts of data. 

What is most valuable?

The simplicity of development is the most valuable feature. 

What needs improvement?

Databricks has a lack of debuggers, and it would be good to see more components. 

Another issue is that the D4 data format keeps changing on our cluster. This doesn't affect me much because I use functions to define it, but it is very frustrating for some more casual users. One day the output will be in a particular format, and then it becomes an object without us changing the cluster configuration. As a small team, we don't have the capacity to dig deeply into the issue, which has been frustrating.

Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
771,212 professionals have used our research since 2012.

For how long have I used the solution?

We have been using the solution for three years. 

What do I think about the stability of the solution?

The solution's stability is good. 

What do I think about the scalability of the solution?

The product is scalable. We're a small organization with 12 users, and we don't currently have any plans to increase our usage.

What was our ROI?

We see an ROI from Databricks. 

What other advice do I have?

I would rate the solution seven out of ten. 

It's a good solution and more for handling large amounts of data. Databricks is better as a batch processing system than as an interactive system. The performance is a little disappointing because the memory processing is supposed to be excellent, but it's not as competitive as some other solutions out there in this regard. Even classical databases can respond and process faster.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sahil Taneja - PeerSpot reviewer
Principal Consultant/Manager at Tenzing
Real User
Top 5
Processes tremendous data easily
Pros and Cons
  • "The processing capacity is tremendous in the database."
  • "There is room for improvement in the documentation of processes and how it works."

What is our primary use case?

Our primary use case is in our project; we are dealing with Duo Special Data, where we need a lot of computing resources. Here, the traditional warehouse cannot handle the amount of data we are using, and this is where Databricks comes into the picture. 

What is most valuable?

The processing capacity is tremendous in the database. We are dealing with Azure as storage, so we have not faced any challenges. And also the connectors to different data sources. Moreover, it is not a language-dependent tool. Therefore, development also takes place faster. It is one of the best features of Databricks.

What needs improvement?

There is room for improvement in the documentation of processes and how it works. I was trying to get one of the certifications, so I saw an area of improvement there. 

For how long have I used the solution?

I have been using Databricks for eight to nine months.

What do I think about the stability of the solution?

It is a stable product for us. We didn't see any challenges. 

What do I think about the scalability of the solution?

There are around 30 to 35 users in our organization. 

How was the initial setup?

The initial setup was easy because the third-party team made the clusters for us. 

What about the implementation team?

A third-party team enabled the cluster to make the setup easy for us. 

What other advice do I have?

I would advise using it based on the use case because it easily handles big data. It is your go-to tool if you are dealing with massive data. 

Overall, I would rate the solution a nine out of ten. The tool performs well in various use cases, availability of documentation online, and compatibility with big data systems like GCP, Azure, or AWS.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Databricks
May 2024
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
771,212 professionals have used our research since 2012.
Lead Data Scientist at a manufacturing company with 10,001+ employees
Real User
Top 5
A great solution that has allowed for collaboration within our organization
Pros and Cons
  • "We have the ability to scale, collaborate and do machine learning."
  • "The product cannot be integrated with a popular coding IDE."

What is our primary use case?

Our primary use case for this solution is research for data scientists. The solution is deployed on cloud.

How has it helped my organization?

It has allowed our data engineers, data scientists, and analysts to collaborate and work on the same platform. 

What is most valuable?

We have the ability to scale, collaborate and do machine learning.

What needs improvement?

The product cannot be integrated with a popular coding IDE.

For how long have I used the solution?

We have been using this solution for approximately three years.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

The solution is scalable. There are five people using it in our organization.

How are customer service and support?

I rate my experience with customer service and support an eight out of ten.

Which solution did I use previously and why did I switch?

We previously used H2O.

How was the initial setup?

The initial setup was straightforward.

What about the implementation team?

Implementation was done in-house.

What was our ROI?

We have seen a return on investments.

What's my experience with pricing, setup cost, and licensing?

Licensing costs are charged on a yearly basis and costs between 25,000 and 30,000.

Which other solutions did I evaluate?

We evaluated other options but this solution was the best fit for what we required.

What other advice do I have?

I rate this solution nine out of ten. The solution is good but can be improved by integrating with a popular coding IDE.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Anand Sharma - PeerSpot reviewer
Sr Data Engineer at PIMCO
Real User
Supports several coding languages, good performance, and facilitates team collaboration
Pros and Cons
  • "The load distribution capabilities are good, and you can perform data processing tasks very quickly."
  • "In the future, I would like to see Data Lake support. That is something that I'm looking forward to."

What is our primary use case?

Our primary use case is ETL.

How has it helped my organization?

Using Databricks enables us to use the Data Mesh methodology, where every team performs their own ETL.

What is most valuable?

The most valuable feature is the versatility of the ecosystem. You can write code in SQL, Python, or Java.

The load distribution capabilities are good, and you can perform data processing tasks very quickly.

You can save and share notebooks between different teams.

The interface is easy to use.

What needs improvement?

The cost of this solution is high, on the expensive side.

In the future, I would like to see Data Lake support. That is something that I'm looking forward to.

For how long have I used the solution?

I worked with Databricks for approximately two years in my previous company.

What do I think about the scalability of the solution?

This is a very scalable solution. We have twenty-five data engineers that use it, and we may grow our usage.

How are customer service and support?

The technical support is okay. I would rate them a seven out of ten.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

We did not use another similar solution prior to Databricks.

How was the initial setup?

The cloud-based deployment is simple.

If you use an on-premises deployment then there is more to do.

What about the implementation team?

We deployed it with our in-house team.

There is no maintenance required.

What was our ROI?

We have seen a return on our investment with Databricks.

What's my experience with pricing, setup cost, and licensing?

Price-wise, I would rate Databricks a three out of five.

Which other solutions did I evaluate?

When we looked into Databricks, we evaluated Azure Data Factory and some of the others on the market. We found that Databricks was one of the easiest ones to use.

What other advice do I have?

My advice for anybody that is looking into Databricks is not to use the on-premises deployment. Instead, use the cloud-based setup.

In summary, this is a good product.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Rupal Sharma - PeerSpot reviewer
Data Architect at Three Ireland (Hutchison) - Infrastructure
Real User
Top 5
Processes large data for data science and data analytics purposes
Pros and Cons
  • "Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours."
  • "There is room for improvement in visualization."

What is our primary use case?

It's mainly used for data science, data analytics, visualization, and industrial analytics.

What is most valuable?

Specifically for data science and data analytics purposes, it can handle large amounts of data in less time. I can compare it with Teradata. If a job takes five hours with Teradata databases, Databricks can complete it in around three to three and a half hours.

So that's why it's quite convenient to use for data science, for training machine learning models. By using more computing power, you can make it even faster.

What needs improvement?

There is room for improvement in visualization.

For how long have I used the solution?

I used it for two years. I worked with the latest update. 

What do I think about the stability of the solution?

I would rate the stability a nine out of ten. I didn't face performance drops.

What do I think about the scalability of the solution?

I would rate the scalability an eight out of ten.

How are customer service and support?

Databrick's support is great. If we need any support, they are very quick with it. And they genuinely want you to use Databricks. So, whatever we ask them, they come up with multiple solutions to problem statements. That's really good.

Overall, the customer service and support are very good.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

I personally prefer using Databricks. However, we also considered using Snowflake, but the pricing was different. It's  price per query.

So, as per your storage, a data scientist or a data analytics team needs to query again and again, which does not suit a data-heavy organization.

What was our ROI?

It's a good return on investment for Databricks from a delivery perspective. Delivered multiple dashboards. So, it's quite a good return on investment. And being a small organization, everyone can use Databricks, and cost-wise, it's also good for small organizations.

Which other solutions did I evaluate?

If the company is a startup, Databricks might be suitable. If a big company needs a lot of storage, Teradata might be best for them. It depends on the situation.

What other advice do I have?

Overall, I would rate the solution a eight out of ten. I would definitely recommend this solution for small organizations. 

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
PeerSpot user
Kevin McAllister - PeerSpot reviewer
Executive Manager at Hexagon AB
Real User
Top 5Leaderboard
Excellent data transformation but data-serving performance could be better
Pros and Cons
  • "Databricks' most valuable feature is the data transformation through PySpark."
  • "Databricks' performance when serving the data to an analytics tool isn't as good as Snowflake's."

What is our primary use case?

We mainly use Databricks to process ingest and do the ELT processes of data to get it ready for analytics and to serve the data to ThoughtSpot, which calls queries and Databricks to get the data.

How has it helped my organization?

We didn't have any good tooling for ELT processing prior to Databricks. We were using Microsoft HD Insight, but it was taking too long to process the data. When we changed our data-processing ELT processes over to Databricks, the amount of time to process the data was reduced to a fraction of what HD Insight used, so we were able to run jobs much faster.

What is most valuable?

Databricks' most valuable feature is the data transformation through PySpark.

What needs improvement?

Databricks' performance when serving the data to an analytics tool isn't as good as Snowflake's. In the next release, Databricks should include a better data-sharing platform to facilitate data sharing between companies.

For how long have I used the solution?

I've been using Databricks for three years.

What do I think about the stability of the solution?

Databricks' stability has been great, and I would rate it eight out of ten.

What do I think about the scalability of the solution?

Databricks is very scalable because it's very easy to spin up multiple clusters, but the cost of doing that is tremendous. I'd rate its scalability nine out of ten, but you'll pay for it.

How are customer service and support?

The technical support has been really bad, but that's because we don't have a direct agreement with Databricks.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I previously used HD Insight from Microsoft, but it took many, many hours to process data, so we switched to Databricks.

How was the initial setup?

The initial setup was pretty complex and required three people.

What about the implementation team?

We used an in-house team with some consulting help.

What was our ROI?

We've had a low ROI from Databricks.

What's my experience with pricing, setup cost, and licensing?

I would rate Databricks' pricing seven out of ten.

What other advice do I have?

I would advise anyone thinking of implementing Databricks to know their use case. For example, if you're looking for a big data repository to query data and do ELT processing, I recommend looking at other platforms, like Snowflake. However, if you're going to do AI and machine learning, then Databricks is probably stronger in that area. Overall, I would rate Databricks seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Shiva Prasad ELLUR - PeerSpot reviewer
Vice President - Data Engineering and Analytics at a financial services firm with 10,001+ employees
Real User
Top 5
A good, but expensive, web-based platform for automated cluster management with some coding limitations
Pros and Cons
  • "We like that this solution can handle a wide variety and velocity of data engineering, either in batch mode or real-time."
  • "This solution only supports queries in SQL and Python, which is a bit limiting."

What is our primary use case?

We use this solution for advanced civilization power.

What is most valuable?

We like that this solution can handle a wide variety and velocity of data engineering, either in batch mode or real-time.

This product allows us to write the email models in a way that allows us to take the advantage of the parallel scaling computer window backend on any of the satellite services.

What needs improvement?

This solution only supports queries in SQL and Python, which is a bit limiting. 

This is a fairly expensive solution for any service outside of the basic package, and costs can add up quite quickly if there are large scaling requirements.

What do I think about the stability of the solution?

This is a stable solution in our experience.

What do I think about the scalability of the solution?

We have found that part of the beauty of this platform is that it is easy to scale and expand.

How are customer service and support?

The support for this product uses Microsoft as a middle man, and due to this there have been times when we experienced communication delays, as well as misunderstandings of what our issues are.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup for this solution is very simple.

What's my experience with pricing, setup cost, and licensing?

The basic version of this solution is now open-source, so there are no license costs involved. However, there is a charge for any advanced functionality and this can be quite expensive.

Which other solutions did I evaluate?

We looked at both Snowflake and BigQuery as a comparison with this solution. We choose this product as it offered more scalability and a higher level of security, which is extremely important in our banking environment.

What other advice do I have?

We would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Sr. BigData Architect at ITC Infotech
MSP
Very elastic, easy to scale, and a straightforward setup
Pros and Cons
  • "It's easy to increase performance as required."
  • "Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively."

What is our primary use case?

We work with clients in the insurance space mostly. Insurance companies need to process claims. Their claim systems run under Databricks, where we do multiple transformations of the data. 

What is most valuable?

The elasticity of the solution is excellent.

The storage, etc., can be scaled up quite easily when we need it to.

It's easy to increase performance as required.

The solution runs on Spark very well.

What needs improvement?

Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively.

They're currently coming out with a new feature, which is Date Lake. It will come with a new layer of data compliance.

For how long have I used the solution?

We've been using the solution for two years.

What do I think about the stability of the solution?

I don't see any issues with stability going down to the cluster. It would certainly be fine if it's maintained. It's highly available even if things are dropped. It will still be up and running. I would describe it as very reliable. We don't have issues with crashing. There aren't bugs and glitches that affect the way it works.

What do I think about the scalability of the solution?

The system is extremely scalable. It's one of its greatest features and a big selling point. If a company needs to scale or expand, they can do so very easily.

We require daily usage from the solution even though we don't directly work with Databricks on a day to day basis. Due to the fact that we schedule everything we need and it will trigger work that needs to be done, it's used often. Do you need to log into the database console every day? No. You just need to configure it one time and that's it. Then it will deliver everything needed in the time required.

How are customer service and technical support?

We use Microsoft support, so we are enterprise customers for them. We raise a service request for Databricks, however, we use Microsoft. Overall, we've been satisfied with the support we've been given. They're responsive to our needs.

Which solution did I use previously and why did I switch?

We work with multiple clients and this solution is just one of the examples of products we work with. We use several others as well, depending on the client.

It's all wrappers between the same underlying systems. For example, Spark. It's all open-source. We've worked with them as well as the wrappers around it, whether the company was labeled Databrary, IBM insights, Cloudera, etc. These wrappers are all on the same open-source system.

If we with Azure data, we take over Databricks. Otherwise, we have to create a VM separately. Those things are not needed because Azure is already providing those things for us.

How was the initial setup?

The situation may have been a bit different for me than for many users or organizations. I've been in this industry for more than 15 or 17 years. I have a lot of experience. I also took the time to do some research and preparation for the setup. It was straightforward for me.

The deployment with Microsoft usually can be done in 20 minutes. However, it can take 40 to 45 minutes to complete. An organization only requires one person to upload the data and have complete access to the account.

What about the implementation team?

I deployed the solution myself. I didn't require any assistance, so I didn't enlist any resellers or consultants to help with the process.

What's my experience with pricing, setup cost, and licensing?

The solution is expensive. It's not like a lot of competitors, which are open-source.

What other advice do I have?

There isn't really a version, per se. 

It's a popular service. I'd recommend the solution. The solution is cloud-agnostic right now, so it really can go into any cloud. It's the users who will be leveraging installed environments that can have these services, no matter if they are using Azure or Ubiquiti, or other systems.

I don't think you can find any other tool or any other service that is faster them Databricks. I don't see that right now. It's your best option.

Overall, I'd rate the solution eight out of ten. The reason I'm not giving it full marks is that it's expensive compared to open source alternatives. Also, the configuration is difficult, so sometimes you need to spend a couple of hours to get it right.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Buyer's Guide
Download our free Databricks Report and get advice and tips from experienced pros sharing their opinions.