We performed a comparison between Databricks and H2O.ai based on real PeerSpot user reviews.
Find out in this report how the two Data Science Platforms solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."It's easy to increase performance as required."
"The solution is built from Spark and has integration with MLflow, which is important for our use case."
"Ability to work collaboratively without having to worry about the infrastructure."
"The setup is quite easy."
"Databricks' Lakehouse architecture has been most useful for us. The data governance has been absolutely efficient in between other kinds of solutions."
"Its lightweight and fast processing are valuable."
"Databricks provides a consistent interface for data engineers to work with data in a consistent language on a single integrated platform for ingesting, processing, and serving data to the end user."
"Databricks is a unified solution that we can use for streaming. It is supporting open source languages, which are cloud-agnostic. When I do database coding if any other tool has a similar language pack to Excel or SQL, I can use the same knowledge, limiting the need to learn new things. It supports a lot of Python libraries where I can use some very easily."
"One of the most interesting features of the product is their driverless component. The driverless component allows you to test several different algorithms along with navigating you through choosing the best algorithm."
"The ease of use in connecting to our cluster machines."
"It is helpful, intuitive, and easy to use. The learning curve is not too steep."
"The most valuable features are the machine learning tools, the support for Jupyter Notebooks, and the collaboration that allows you to share it across people."
"Fast training, memory-efficient DataFrame manipulation, well-documented, easy-to-use algorithms, ability to integrate with enterprise Java apps (through POJO/MOJO) are the main reasons why we switched from Spark to H2O."
"AutoML helps in hands-free initial evaluations of efficiency/accuracy of ML algorithms."
"The solution could be improved by adding a feature that would make it more user-friendly for our team. The feature is simple, but it would be useful. Currently, our team is more familiar with the language R, but Databricks requires the use of Jupyter Notebooks which primarily supports Python. We have tried using RStudio, but it is not a fully integrated solution. To fully utilize Databricks, we have to use the Jupyter interface. One feature that would make it easier for our team to adopt the Jupyter interface would be the ability to select a specific variable or line of code and execute it within a cell. This feature is available in other Jupyter Notebooks outside of Databricks and in our own IDE, but it is not currently available within Databricks. If this feature were added, it would make the transition to using Databricks much smoother for our team."
"Would be helpful to have additional licensing options."
"The pricing of Databricks could be cheaper."
"It would be better if it were faster. It can be slow, and it can be super fast for big data. But for small data, sometimes there is a sub-second response, which can be considered slow. In the next release, I would like to have automatic creation of APIs because they don't have it at the moment, and I spend a lot of time building them."
"I would like it if Databricks adopted an interface more like R Studio. When I create a data frame or a table, R Studio provides a preview of the data. In R Studio, I can see that it created a table with so many columns or rows. Then I can click on it and open a preview of that data."
"I have seen better user interfaces, so that is something that can be improved."
"Databricks doesn't offer the use of Python scripts by itself and is not connected to GitHub repositories or anything similar. This is something that is missing. if they could integrate with Git tools it would be an advantage."
"I would like it if Databricks made it easier to set up a project."
"The model management features could be improved."
"On the topic of model training and model governance, this solution cannot handle ten or twelve models running at the same time."
"I would like to see more features related to deployment."
"Referring to bullet-3 as well, H2O DataFrame manipulation capabilities are too primitive."
"It lacks the data manipulation capabilities of R and Pandas DataFrames. We would kill for dplyr offloading H2O."
"The interpretability module has room for improvement. Also, it needs to improve its ability to integrate with other systems, like SageMaker, and the overall integration capability."
"It needs a drag and drop GUI like KNIME, for easy access to and visibility of workflows."
Earn 20 points
Databricks is ranked 1st in Data Science Platforms with 78 reviews while H2O.ai is ranked 20th in Data Science Platforms. Databricks is rated 8.2, while H2O.ai is rated 7.6. The top reviewer of Databricks writes "A nice interface with good features for turning off clusters to save on computing". On the other hand, the top reviewer of H2O.ai writes "It is helpful, intuitive, and easy to use. The learning curve is not too steep". Databricks is most compared with Amazon SageMaker, Informatica PowerCenter, Dataiku, Dremio and Microsoft Azure Machine Learning Studio, whereas H2O.ai is most compared with Amazon SageMaker, Dataiku, Microsoft Azure Machine Learning Studio, KNIME and IBM Watson Studio. See our Databricks vs. H2O.ai report.
See our list of best Data Science Platforms vendors.
We monitor all Data Science Platforms reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.