We performed a comparison between Apache Spark, Hortonworks Data Platform, and IBM InfoSphere BigInsights [EOL] based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten."
"One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"The product is useful for analytics."
"The most valuable feature of Apache Spark is its ease of use."
"The scalability is the key reason why we are on this platform."
"The Hortonworks solution is so stable. It is working as a production system, without any error, without any downtime. If I have downtime, it is mostly caused by the hardware of the computers."
"The product offers a fairly easy setup process."
"Ambari Web UI: user-friendly."
"It is a scalable platform."
"We use it for data science activities."
"Ranger for security; with Ranger we can manager user’s permissions/access controls very easily."
"Now, using this solution, it is much cheaper to have all of the data available for searching, not in real-time, but whenever there is a pending request."
"InfoSphere Streams was the one core product from the platform in which we were using. We were building a real-time response system and we built it on InfoSphere Streams."
"This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed."
"If you have a Spark session in the background, sometimes it's very hard to kill these sessions because of D allocation."
"The setup I worked on was really complex."
"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."
"The migration of data between different versions could be improved."
"The initial setup was not easy."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate."
"More information could be there to simplify the process of running the product."
"It's at end of life and no longer will there be improvements."
"I would like to see more support for containers such as Docker and OpenShift."
"Since Cloudera acquired HDP, it's been bundled with CBH and HDP. However, the biggest challenge is cloud storage integration with Azure, GCP, and AWS."
"Hive performance. If Hive performance increased, Hadoop would replace (not everywhere) traditional databases."
"Security and workload management need improvement."
"It would also be nice if there were less coding involved."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"The UI was not interactive: Responses used to be very slow and hang up at times."
Earn 20 points