We performed a comparison between Apache Spark and Hortonworks Data Platform based on real PeerSpot user reviews.
Find out in this report how the two Hadoop solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"Spark can handle small to huge data and is suitable for any size of company."
"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"The most valuable feature of this solution is its capacity for processing large amounts of data."
"The most valuable feature of Apache Spark is its flexibility."
"The product is useful for analytics."
"The most valuable feature of Apache Spark is its memory processing because it processes data over RAM rather than disk, which is much more efficient and fast."
"There's a lot of functionality."
"The product offers a fairly easy setup process."
"The scalability is the key reason why we are on this platform."
"The Hortonworks solution is so stable. It is working as a production system, without any error, without any downtime. If I have downtime, it is mostly caused by the hardware of the computers."
"We use it for data science activities."
"Distributed computing, secure containerization, and governance capabilities are the most valuable features."
"The upgrades and patches must come from Hortonworks."
"Hortonworks should not be expensive at all to those looking into using it."
"The data platform is pretty neat. The workflow is also really good."
"Its UI can be better. Maintaining the history server is a little cumbersome, and it should be improved. I had issues while looking at the historical tags, which sometimes created problems. You have to separately create a history server and run it. Such things can be made easier. Instead of separately installing the history server, it can be made a part of the whole setup so that whenever you set it up, it becomes available."
"The solution must improve its performance."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"One limitation is that not all machine learning libraries and models support it."
"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"At times during the deployment process, the tool goes down, making it look less robust. To take care of the issues in the deployment process, users need to do manual interventions occasionally."
"The setup I worked on was really complex."
"The cost of the solution is high and there is room for improvement."
"Hive performance. If Hive performance increased, Hadoop would replace (not everywhere) traditional databases."
"Security and workload management need improvement."
"The version control of the software is also an issue."
"Deleting any service requires a lot of clean up, unlike Cloudera."
"It would also be nice if there were less coding involved."
"I work a lot with banking, IT and communications customers. Hortonworks must improve or must upgrade their services for these sectors."
"More information could be there to simplify the process of running the product."
Apache Spark is ranked 1st in Hadoop with 60 reviews while Hortonworks Data Platform is ranked 6th in Hadoop with 25 reviews. Apache Spark is rated 8.4, while Hortonworks Data Platform is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Hortonworks Data Platform writes "Good for secure containerization, and governance capabilities ". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Cloudera Distribution for Hadoop, whereas Hortonworks Data Platform is most compared with Amazon EMR, Cloudera DataFlow and HPE Ezmeral Data Fabric. See our Apache Spark vs. Hortonworks Data Platform report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.