We performed a comparison between Apache Spark, Cloudera Distribution for Hadoop, and IBM InfoSphere BigInsights [EOL] based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The deployment of the product is easy."
"The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"I found the solution stable. We haven't had any problems with it."
"Provides a lot of good documentation compared to other solutions."
"The solution is very stable."
"Spark helps us reduce startup time for our customers and gives a very high ROI in the medium term."
"The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics."
"One of the key features is that Apache Spark is a distributed computing framework. You can help multiple slaves and distribute the workload between them."
"We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
"The most valuable feature is that I can use CDH for almost all use cases across all industries, including the financial sector, public sector, private retailers, and so on."
"The solution is stable."
"It is helpful to gather and process data."
"I don't see any performance issues."
"The scalability of Cloudera Distribution for Hadoop is excellent."
"The main advantage is the storage is less expensive."
"The solution is reliable and stable, it fits our requirements."
"InfoSphere Streams was the one core product from the platform in which we were using. We were building a real-time response system and we built it on InfoSphere Streams."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"Apache Spark could potentially improve in terms of user-friendliness, particularly for individuals with a SQL background. While it's suitable for those with programming knowledge, making it more accessible to those without extensive programming skills could be beneficial."
"There could be enhancements in optimization techniques, as there are some limitations in this area that could be addressed to further refine Spark's performance."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"Apache Spark can improve the use case scenarios from the website. There is not any information on how you can use the solution across the relational databases toward multiple databases."
"When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources."
"The procedure for operations could be simplified."
"The pricing needs to improve."
"It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"The price of this solution could be lowered."
"This is a very expensive solution."
"The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."
"It could be faster and more user-friendly."
"The UI was not interactive: Responses used to be very slow and hang up at times."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →
Earn 20 points