Anonymous UserCo-Founder at a tech services company
We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
"What comes with the standard setup is what we mostly use, but Ambari is the most important."
"The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
"It's good for storing historical data and handling analytics on a huge amount of data."
"The most valuable feature is the database."
"The most valuable features are powerful tools for ingestion, as data is in multiple systems."
"The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so."
"The performance is pretty good."
"Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."
"Some of the best features are stored procedures, parallelism, and different indexing strategies."
"I think it scales really well and as long as you take enough time to learn a little bit about it, it works really well."
"In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency."
"There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution."
"The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment."
"It would be good to have more advanced analytics tools."
"It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake."
"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."
"The solution is very expensive."
"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."
"The areas of the solution that is needing the most improvement are separating compute from storage, elasticity, which means scaling up and then retracting."
"The biggest problems we have is when the backup solution is failing or slow and we run out of log space, which has happened probably a couple of times in the last four years."
"In a traditional on-prem database, in a data warehouse, the solution is probably on the expensive side."
Apache Hadoop is ranked 7th in Data Warehouse with 8 reviews while IBM Db2 Warehouse is ranked 16th in Data Warehouse with 2 reviews. Apache Hadoop is rated 7.6, while IBM Db2 Warehouse is rated 8.0. The top reviewer of Apache Hadoop writes "Great micro-partitions, helpful technical support and quite stable". On the other hand, the top reviewer of IBM Db2 Warehouse writes "If you have good people designing how the data is stored, this is a marvelous tool". Apache Hadoop is most compared with Snowflake, Microsoft Azure Synapse Analytics, VMware Tanzu Greenplum and Oracle Exadata, whereas IBM Db2 Warehouse is most compared with Oracle Exadata, Microsoft Azure Synapse Analytics, Oracle Autonomous Data Warehouse, Teradata and Microsoft Parallel Data Warehouse. See our Apache Hadoop vs. IBM Db2 Warehouse report.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.