We performed a comparison between Apache Spark, Cloudera Distribution for Hadoop, and IBM Spectrum Computing based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort."
"Apache Spark can do large volume interactive data analysis."
"I appreciate everything about the solution, not just one or two specific features. The solution is highly stable. I rate it a perfect ten. The solution is highly scalable. I rate it a perfect ten. The initial setup was straightforward. I recommend using the solution. Overall, I rate the solution a perfect ten."
"There's a lot of functionality."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware."
"Provides a lot of good documentation compared to other solutions."
"The solution is scalable."
"It is helpful to gather and process data."
"The most valuable feature is Kubernetes."
"Cloudera is a very manageable solution with good support."
"It has the best proxy, security, and support features compared to open-source products."
"We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
"The product provides better data processing features than other tools."
"Very good end-to-end security features."
"The solution is stable."
"Spectrum Computing's best features are its speed, robustness, and data processing and analysis."
"This solution is working for both VTL and tape."
"The most valuable feature is the backup capability."
"The most valuable aspect of the product is the policy driving resource management, to optimize the computing across data centers."
"Easy to operate and use."
"We are satisfied with the technical support, we have no issues."
"At times during the deployment process, the tool goes down, making it look less robust. To take care of the issues in the deployment process, users need to do manual interventions occasionally."
"The logging for the observability platform could be better."
"It should support more programming languages."
"Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, do the transformation in a subsecond, and all that."
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"One limitation is that not all machine learning libraries and models support it."
"The tool's ability to be deployed on a cloud model is an area of concern where improvements are required."
"The pricing needs to improve."
"The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better."
"There are multiple bugs when we update."
"Currently, we are using many other tools such as Spark and Blade Job to improve the performance."
"They should focus on upgrading their technical capabilities in the market."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."
"Cloudera's support is extremely bad and cannot be relied on."
"We'd like to see some AI model training for machine learning."
"Spectrum Computing is lagging behind other products, most likely because it hasn't been shifted to the cloud."
"Lack of sufficient documentation, particularly in Spanish."
"We have not been able to use deduplication."
"This solution is no longer managing tapes correctly."
"SMB storage and HPC is not compatible and it should be supported by IBM Spectrum Computing."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →