We performed a comparison between Apache Spark, Cloudera Distribution for Hadoop, and Hortonworks Data Platform based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The most valuable feature of this solution is its capacity for processing large amounts of data."
"It provides a scalable machine learning library."
"The most valuable feature of Apache Spark is its ease of use."
"We use Spark to process data from different data sources."
"The solution has been very stable."
"Apache Spark can do large volume interactive data analysis."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"CDH has a wide variety of proprietary tools that we use, like Impala. So from that perspective, it's quite useful as opposed to something open-source. We get a lot of value from Cloudera's proprietary tools."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that."
"It has the best proxy, security, and support features compared to open-source products."
"The main advantage is the storage is less expensive."
"We also really like the Cloudera community. You can have any question and will have your answer within a few hours."
"Cloudera is a very manageable solution with good support."
"Very good end-to-end security features."
"The solution's most valuable feature is the enterprise data platform."
"Hortonworks should not be expensive at all to those looking into using it."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"Distributed computing, secure containerization, and governance capabilities are the most valuable features."
"The upgrades and patches must come from Hortonworks."
"The scalability is the key reason why we are on this platform."
"Ambari Web UI: user-friendly."
"The product offers a fairly easy setup process."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"More ML based algorithms should be added to it, to make it algorithmic-rich for developers."
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources."
"Stability in terms of API (things were difficult, when transitioning from RDD to DataFrames, then to DataSet)."
"The setup I worked on was really complex."
"It should support more programming languages."
"Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."
"The price of this solution could be lowered."
"There are multiple bugs when we update."
"Cloudera's support is extremely bad and cannot be relied on."
"It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."
"It could be faster and more user-friendly."
"This is a very expensive solution."
"They should focus on upgrading their technical capabilities in the market."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"More information could be there to simplify the process of running the product."
"I would like to see more support for containers such as Docker and OpenShift."
"I work a lot with banking, IT and communications customers. Hortonworks must improve or must upgrade their services for these sectors."
"Security and workload management need improvement."
"It's at end of life and no longer will there be improvements."
"The cost of the solution is high and there is room for improvement."
"It would also be nice if there were less coding involved."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →