We just raised a $30M Series A: Read our story

H2O.ai OverviewUNIXBusinessApplication

H2O.ai is the #14 ranked solution in our list of top Data Science Platforms. It is most often compared to Dataiku Data Science Studio: H2O.ai vs Dataiku Data Science Studio

What is H2O.ai?

H2O is a fully open source, distributed in-memory machine learning platform with linear scalability. H2O’s supports the most widely used statistical & machine learning algorithms including gradient boosted machines, generalized linear models, deep learning and more. H2O also has an industry leading AutoML functionality that automatically runs through all the algorithms and their hyperparameters to produce a leaderboard of the best models. The H2O platform is used by over 14,000 organizations globally and is extremely popular in both the R & Python communities.

Buyer's Guide

Download the Data Science Platforms Buyer's Guide including reviews and more. Updated: September 2021

H2O.ai Customers

poder.io, Stanley Black & Decker, G5, PWC, Comcast, Cisco

H2O.ai Video

Archived H2O.ai Reviews (more than two years old)

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
DR
Supervisor in Research and Development Area with 1,001-5,000 employees
Real User
We're hoping to save costs on internal development but keep enough flexibility to choose ML techniques and performance indicators

What is our primary use case?

The idea is to migrate the current model's development practice to another platform. Then after, try to create a proprietary platform using R and Python. The company is interested in using an external platform in order to have an updated environment.

How has it helped my organization?

Still on it. The idea is to save the cost of internal development but keeping enough flexibility to choose ML techniques and performance indicators.

What is most valuable?

Still on it.

What needs improvement?

Feature engineering.

For how long have I used the solution?

Still implementing.

What is our primary use case?

The idea is to migrate the current model's development practice to another platform. Then after, try to create a proprietary platform using R and Python. The company is interested in using an external platform in order to have an updated environment.

How has it helped my organization?

Still on it. The idea is to save the cost of internal development but keeping enough flexibility to choose ML techniques and performance indicators.

What is most valuable?

Still on it.

What needs improvement?

Feature engineering.

For how long have I used the solution?

Still implementing.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
DC
Managing VP of Machine Learning at a financial services firm with 10,001+ employees
Real User
The driverless component allows you to test several different algorithms along with navigating you through choosing the best algorithm, but the interpretability module has room for improvement

Pros and Cons

  • "One of the most interesting features of the product is their driverless component. The driverless component allows you to test several different algorithms along with navigating you through choosing the best algorithm."
  • "The interpretability module has room for improvement. Also, it needs to improve its ability to integrate with other systems, like SageMaker, and the overall integration capability."

What is our primary use case?

Our primary use case is machine learning.

How has it helped my organization?

It has enabled our work force to be more efficient.

What is most valuable?

One of the most interesting features of the product is their driverless component. The driverless component allows you to test several different algorithms along with navigating you through choosing the best algorithm. It also gives you an interpretability capability which allows you to have some understanding of what's inside the algorithm and why it's behaving a certain way, making sure you are not bias towards the outcome.

What needs improvement?

The interpretability module has room for improvement. Also, it needs to improve its ability to integrate with other systems, like SageMaker, and the overall integration capability.

I would like more support for scalability and deep learning. Right now, they are very strong in supervise and supervise learning, but not in deep learning. I'd like to see them be more well-rounded, where they have support for deep learning, but I'm not sure that is their business model.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

In terms of the stress we put on it, it is still in the very early days for us to actually take it through its phases.

What do I think about the scalability of the solution?

It does appear to scale. We have very large use cases. The product scales as advertised. 

How is customer service and technical support?

They have excellent tech support.

How was the initial setup?

It was fairly easy to set up, then get up and running.

Which other solutions did I evaluate?

It was already selected. I don't know what process the company went through.

What other advice do I have?

Do your due diligence, making sure with your use cases, this is the right product for you. 

Directionally, they are headed in the right place. They're also putting a lot of muscle behind it, but they're very focused in one area. Supervised on supervised learning is the market that they're going after. If that's their strategy, then they'll get some part of the market, but they'll leave the other part of the market behind.

We use just the AWS version of the product.

It integrates well with our notebooks. It also integrates well with our homegrown tool sets.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Find out what your peers are saying about H2O.ai, Knime, Dataiku and others in Data Science Platforms. Updated: September 2021.
542,267 professionals have used our research since 2012.
MH
Data Scientist with 51-200 employees
Real User
There is an ease of use when connecting it to our cluster machines. I would like to see more features related to deployment.

What is our primary use case?

We use it for building models with large amounts of data.

How has it helped my organization?

We are using it for prototype projects. We have not deployed it.

What is most valuable?

The ease of use in connecting to our cluster machines.

What needs improvement?

I would like to see more features related to deployment.

For how long have I used the solution?

Trial/evaluations only.

What do I think about the stability of the solution?

As this was just for a prototype, we did not stress the product too much.

What do I think about the scalability of the solution?

For the use case that I had, I did not run into any scaling issues. Therefore, I worked well for scalability.

How is customer service and technical support?

I didn't run into any…

What is our primary use case?

We use it for building models with large amounts of data.

How has it helped my organization?

We are using it for prototype projects. We have not deployed it.

What is most valuable?

The ease of use in connecting to our cluster machines.

What needs improvement?

I would like to see more features related to deployment.

For how long have I used the solution?

Trial/evaluations only.

What do I think about the stability of the solution?

As this was just for a prototype, we did not stress the product too much.

What do I think about the scalability of the solution?

For the use case that I had, I did not run into any scaling issues. Therefore, I worked well for scalability.

How is customer service and technical support?

I didn't run into any issues, as the application was very clear. So, I did not contact technical support.

What other advice do I have?

It deals well with its core functionality. The product is definitely worth looking at, as it is one of the upcoming products where you can build large models for use cases.

I am using the on-premise version.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Rahul Koduru
Director of Data Engineering at Transamerica
Real User
It is helpful, intuitive, and easy to use. The learning curve is not too steep.

Pros and Cons

  • "It is helpful, intuitive, and easy to use. The learning curve is not too steep."
  • "The model management features could be improved."

What is our primary use case?

Our primary use case is for data science. Some of our data scientists use it pretty heavily to build models.

How has it helped my organization?

One example, we are able to automate life insurance. We have to underwrite policies. When somebody applies for a policy, we take their blood, then assign them a risk: substandard, standard, preferred, etc. Depending on this, we price our products. Usually the process is that you take the blood, then it goes to a lab and we get the lab results back, then an underwriter takes a look at the lab results. This is usually done in a two week time frame to get a rating. We were able to build models to automate all of this, and now, it happens in real-time. Somebody can apply online and get issued a policy right away.

What is most valuable?

It is helpful, intuitive, and easy to use. The learning curve is not too steep.

What needs improvement?

The model management features could be improved.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

We haven't put a lot of stress on it.

What do I think about the scalability of the solution?

The size of the environment for my database is probably about 900TB. 

So far, the product has been good from a scalability prospective.

How is customer service and technical support?

I would rate the technical support as an eight out of ten.

How was the initial setup?

The integration and configuration were good. I would rate them as an eight out of ten.

What was our ROI?

We have seen significant ROI where we were able to use the product in certain key projects and could automate a lot of processes. We were even able to reduce staff.

Which other solutions did I evaluate?

We looked at Amazon SageMaker on AWS. 

This product still was open source at that point, then we did get proprietary support after that. The other products were not open source, and we couldn't really try them out beforehand to see if we liked them or not.

H2O.ai is a great product for data scientists in general. It has a lot of options and is really flexible. Also, the pricing was good.

What other advice do I have?

H2O.ai works directly with a lot of our cloud data, big data environment, and Amazon RedShift environment. The big data integration was easier from a performance perspective than Amazon RedShift. That is because our big data environment is still on-premise vs RedShift, which is on the cloud, so we had to go through some struggles to get it operating with RedShift.

We also use the on-premise version.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user862530
Associate Consultant at a tech services company with 201-500 employees
Consultant
​AutoML helps in hands-free evaluations of ML algorithms, but solution needs a GUI

What is our primary use case?

Testing/modeling data in the initial stages of approaching a machine-learning problem. Environment: Laptops running Ubuntu 16.04/Python 3.

What is most valuable?

AutoML helps in hands-free initial evaluations of efficiency/accuracy of ML algorithms; with training input data.

What needs improvement?

It needs a drag and drop GUI like KNIME, for easy access to and visibility of workflows.

For how long have I used the solution?

Less than one year.

What is our primary use case?

Testing/modeling data in the initial stages of approaching a machine-learning problem. Environment: Laptops running Ubuntu 16.04/Python 3.

What is most valuable?

AutoML helps in hands-free initial evaluations of efficiency/accuracy of ML algorithms; with training input data.

What needs improvement?

It needs a drag and drop GUI like KNIME, for easy access to and visibility of workflows.

For how long have I used the solution?

Less than one year.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
it_user837546
Principal Data Scientist
Real User
Provides fast training, memory-efficient DataFrame manipulation, well-documented and easy-to-use algorithms

Pros and Cons

  • "Fast training, memory-efficient DataFrame manipulation, well-documented, easy-to-use algorithms, ability to integrate with enterprise Java apps (through POJO/MOJO) are the main reasons why we switched from Spark to H2O."
  • "Referring to bullet-3 as well, H2O DataFrame manipulation capabilities are too primitive."
  • "It lacks the data manipulation capabilities of R and Pandas DataFrames. We would kill for dplyr offloading H2O."

What is our primary use case?

We currently use H2O for real-time predictive analytics for fraud prevention. We have a Java-based feature engineering pipeline that attaches to POJO objects obtained from H2O. We run the whole pipeline on lightweight VMs and get under 1ms latency for each real-time transaction scoring.

How has it helped my organization?

We previously needed a four-machine Spark cluster to be able to train an ML model using tens of millions of transactions, and hours of time during the modeling phase. Currently, same training can now be done on an old MacBook pro with 8GB RAM within few minutes.

What is most valuable?

  • Fast training
  • Memory-efficient DataFrame manipulation
  • Well-documented, easy-to-use algorithms
  • Ability to integrate with enterprise Java apps (through POJO/MOJO) 

These are the main reasons why we switched from Spark to H2O.

What needs improvement?

Referring to bullet-3 as well, H2O DataFrame manipulation capabilities are too primitive.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

Yes, we ran into a few bugs and opened JIRA tickets with reproducible test cases. We found workarounds for the problems ourselves.

What do I think about the scalability of the solution?

No issues with scalability, it works smoothly.

How are customer service and technical support?

We use the open-source/community branch and get support through forum discussions.

Which solution did I use previously and why did I switch?

We used to developing on Scala + Spark ML. We switched, at least in part, due to reasons mentioned in the Valuable Features section of this review.

How was the initial setup?

Initial setup is very easy through pip, JAR download, or R install.packages.

What's my experience with pricing, setup cost, and licensing?

Currently, we do not purchase enterprise support.

Which other solutions did I evaluate?

We have experience with pretty much everything available; hence, the switch was an informed decision and natural.

What other advice do I have?

We rate it at eight out of 10. It is very fast, light-weight, well-documented, and low-maintenance. The reasons it is not rated 10 are, it lacks the data manipulation capabilities of R and Pandas DataFrames. We would kill for dplyr offloading H2O.

Disclosure: I am a real user, and this review is based on my own experience and opinions.