We just raised a $30M Series A: Read our story
2019-10-14T12:39:00Z

What needs improvement with Databricks?

70

Please share with the community what you think needs improvement with Databricks.

What are its weaknesses? What would you like to see changed in a future version?

ITCS user
Guest
2121 Answers

author avatar
Top 10Real User

The product is quite ambitious. It's trying to become a centralized platform for all data ingestion, transformation, and analytics needs. It may encounter a stiff competition from best of breed solutions powered by open source software. Overall it's a good product, however, it might get challenged over time with with individual best-of-breed products. For example in the area of Data Science, RStudio seems to be the industry standard at the moment. RStudio IDE is richer, there are a more out of the box functionalities like a push-button publishing, etc. It's more difficult to run R within Databricks. Especially when it comes to synchronizing the R packages, it legs behind. It's not even supporting the latest version of R 1.3. I believe eventually all analytics will converge into data science. The analytics of the future will be data science, because predicting the future will be one of the most prevalent use cases. The stuff we used to do before, slicing and dicing, drilling through, trend analysis, etc. will become redundant operations after the analytics toolsets become powered by AI/ML and fully automated. Unless the organisations acquire these platforms that can cater for machine learning and artificial intelligence, including natural language processing they will have a hard time surviving. With Databricks I would like to see more integration with and accommodation of open-source products. This could be controversial, as it could question the whole configuration and the purpose of the product. I'm pretty sure Microsoft is trying to position it in a monopoly market as they did with Windows and MS Office so that they could charge the premium. We are beginning to see the similar product strategy behind Databricks.

2021-04-21T14:10:02Z
author avatar
Top 5LeaderboardReal User

The solution works very well for us. I can't recall any missing features or anything the solution really lacks. It's very complete. It would help if there were different versions of the solution on offer. The integration of data could be a bit better.

2021-04-16T14:25:06Z
author avatar
Top 5LeaderboardReal User

The user experience can be improved. It's not easy to use, and they need a better UI.

2021-03-29T17:53:14Z
author avatar
Top 5LeaderboardReal User

Databricks requires writing code in Python or SQL, so if you're a good programmer then you can use Databricks.

2021-02-25T13:40:16Z
author avatar
Top 5LeaderboardVendor

Costs can quickly add up if you don't plan for it.

2021-01-10T08:08:17Z
author avatar
Top 5Real User

There is definitely room for improvement. This is the type of solution where you need to have people with technical expertise to use it. Other products are self-service and can be employed by end-users. Databricks is not geared towards the end-user, but rather it is for data engineers or data scientists. I'm not sure whether Databricks is working towards it, or not. It would be nice if it were more user-friendly, where you don't have to rely on Power BI or a visualization tool. I know that there is integration in the notebook where you can do it, but still, the relationships and semantics make it more difficult. It would be better to do it right in Databricks. You could put them within the portal and I don't have to log out and bring that into Power BI and then visualize.

2020-12-08T10:26:21Z
author avatar
Top 10Real User

Since the Databricks community is not that old, there is not a lot of information about some of the issues that we face. We have to go back to the Databricks stream to get some of the issue resolutions from there. As time passes, and more people start putting more information out there about this technology, wit will be helpful. I think even with the features that we currently have, they're still optimizing some of the clusters and trying to parallelize to better read from other types of data. So, that's going really well in terms of one of the features that they recently came up with to include the data format for data, which was really good, and that speeds up a lot of the processes. I would like to see more documentation in terms of how an end-user could use it, and users like me can easily try it and implement use cases.

2020-11-02T23:28:50Z
author avatar
Top 10Real User

I think we are using a lot of people to manage this solution. I'd like to see the people using this solution sharing their knowledge.

2020-10-04T06:40:24Z
author avatar
Top 20Real User

Sometimes we experience issues connecting our database to Databricks. There are no direct connectors — they are very limited. This should be addressed and corrected in the next release. Reading past data can also be tricky as there is no data spectrum like you would find with Snowflake and other solutions.

2020-09-27T04:10:00Z
author avatar
Top 20Real User

I'd like to see more licensing options for the solution, the availability of additional pricing tiers. I understand it's not easy to achieve because it's a kind of platform-as-a-service type of solution. If you wanted to be more specific about the parts, and what you might or might not need, then you could save some money, and go for a lower level. Of course, that would then mean you'd have to manage more configurations which, as a user, would make things more complex but it would be good to have that option. The pricing is not the cheapest but it's understandable because it's a very high-end solution and easy to use, there's a lot of complexity masked away. I would like to see additional monitoring tools and, in general, anything that can improve visualization of data. I know it's not the main point of Databricks and there are other tools that can be used, but anything that facilitates the integration of Databricks with visualization tools could be really useful. Increasing data scalability would also be great.

2020-08-02T08:16:42Z
author avatar
Top 5MSP

Instead of relying on a massive instance, the solution should offer micro partition levels. They're working on it, however, they need to implement it to help the solution run more effectively. They're currently coming out with a new feature, which is Date Lake. It will come with a new layer of data compliance.

2020-06-28T08:51:00Z
author avatar
Top 5Real User

I have seen better user interfaces, so that is something that can be improved. It was quite hard to deploy.

2020-04-13T06:27:36Z
author avatar
Top 20Real User

I think the automatic categorization of variables needs to be improved. The current functionality is not always efficiently identifying the features of the data that is collected. Probably that is the only thing I can think of. Apart from that, I have not explored the product enough yet to go into more depth because there is only one asset project that I have taken on right now. Because I own this company, I have been doing more to run it than to explore this product very deeply. But when you get any form of data inside there, if it could understand what type of variables there are and what features the data has, it would help massively in taking processing to the next step. If it does not exactly identify the variables you may have to modify them a little. Apart from working with Databricks to understand its capabilities, I am also trying to learn Apache Spark right now. Some members of my team want to work with Apache Spark as a solution and at this point, we are evaluating both and we are planning to use Spark or Databricks. As far as what might be added, some custom algorithm samples would be useful. All of the other products of this type — Azure, AWS, SageMaker — they all have customizable algorithms. You have the capability to implement a sort of workflow from that by modifying things in the sample and changing it to fit your purposes. Probably that is something that might help in doing some small NDP (Near-Data Processing) development. It might not help in the project directly, but it will help while we work on some NDP development of our own so that we can quickly evaluate how something is going to work. Templates or other samples could make working on things easier. That would also help massively in getting people to understand the potential of what the product can actually do. But I also think not many people would strongly agree with this. Many people go to the first solution they can think of that they know very well already in the IT field even if they could imagine that something could be better. To get the value out of this technology, people will need to come to accept it. Technical people will accept Databricks more if they understand that this is something that they can use and start working on without a lot of experience. Adopting it will take time for new users who have no experience. But to feel like they can have success with a product, they have to execute something in a very short time and see how it can work. When you talk about AI — or really when you talk about anything new — people do not initially want to invest the time in discovery. These processes do take time to learn, but with templates or samples, you get to see immediately what the possibilities are and what you might get out of it. Then when they try something of their own and are able to get it working in less than a week's time, they will be encouraged to look into the product and the technology some more.

2020-02-09T08:17:00Z
author avatar
Top 20Consultant

Pricing is one of the things that could be improved. Also, there could be improvement in the visual analytics space there and on the machine learning functions. I haven't explored so I don't know about the functions and features that are there. If it is not there, then I think that's something which they should consider including.

2020-02-05T08:05:00Z
author avatar
Top 10Real User

The management of the solution needs to be modernized. Managing the radius data is hard. The solution requires modern scoring. There's not a good way of knowing how the models are performing from a data science perspective. The solution needs more model scoring abilities. It doesn't necessarily need more model monitoring, but more model scoring and performance from a data science perspective. Databricks is an analytics platform. It should offer more data science. It should have more features for data scientists to work with.

2020-02-04T09:59:56Z
author avatar
Top 20Consultant

It would be very helpful if Databricks could integrate with platforms in addition to Azure. Having an open-source version or having the option to get a trial version of Databricks would be very helpful. It would be very useful for beginners if there were tutorials and examples on how to write code for PySpark, R, or Scala. Having examples would give people something to refer to and play with.

2020-01-07T06:27:00Z
author avatar
Top 20Real User

The solution could be improved by integrating it with data packets. Right now, the load tables provide a function, like team collaboration. Still, it's unclear as to if there's a function to create different branches and/or more branches. Our team had used data packets before, however, I feel it's difficult to integrate the current with the previous data packets. The support could be improved a bit around the database. When we stream it to Data Lake, some data cannot be loaded. It should be a priority to fix this.

2019-12-25T08:21:00Z
author avatar
Top 5Real User

Databricks should have more libraries for predictive analysis and machine learning. It should have more compatible and more advanced visualization and machine learning libraries. As it is now, I have to try a customer algorithm in order for things to be compatible. I would like to see more deep learning analytics.

2019-12-11T05:40:00Z
author avatar
Top 10Consultant

Some of the error messages that we receive are too vague, saying things like "unknown exception", and these should be improved to make it easier for developers to debug problems. As it is now, we have to go into the driver logs to identify the error messages properly. There is not much information about Databricks available online, such as cost. Whenever we want to find the actual costing, we have to send an email to Databricks, so having the information available on the internet would be helpful. I would like to see integration with Power BI or Tableau for the business users. They may use Databricks to check on things, but it will be a little bit complicated for them. The GUI interfaces for Tableau and Power BI are ones that they are used to, so the integration would help.

2019-12-09T10:58:00Z
author avatar
Top 5LeaderboardReal User

Improvements could include the pricing, the product is a little expensive, although I think comparable to other similar options. The integration features could be more interesting, more involved. For example, we use the Database Notebook, which is not as great as Jupyter Notebook, for providing a great user experience. The look and feel are not the same and we've had complaints from some of our users. They say that it's easier and more productive for them to use Jupyter Notebook. And then there is the integration feature for connecting to data sources, for example, Jupyter Notebook through publishes connect. The problem is that when you do that, you don't get all the Jupyter features which is a shame for us. For additional features, having some PyTorch or TensorFlow type features inside would definitely be great. For now, my users are developing for themselves by importing their libraries into their Notebook and then creating models based on the potential flow of PyTorch. That requires a lot of imports, particularly library imports, something that is now available in the new version of Machine Learning services. These things are very important because the self appliance community has shifted from the traditional way of preparing models, to a deeper learning system. It's now more common to have those features.

2019-12-03T10:44:00Z
author avatar
Top 20Real User

The product could be improved by offering an expansion of their visualization capabilities, which currently assists in development in their notebook environment. Perhaps a few connectors that auto-deploy to a reporting server? More parallelized Machine Learning libraries would be excellent for predictive analytics algorithms.

2019-10-14T12:39:00Z
Learn what your peers think about Databricks. Get advice and tips from experienced pros sharing their opinions. Updated: October 2021.
542,823 professionals have used our research since 2012.