Anonymous UserTechnical Consultant at a tech services company
Anonymous UserAD - Associate Director at a financial services firm
We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
"I found the solution stable. We haven't had any problems with it."
"The scalability has been the most valuable aspect of the solution."
"The most valuable feature of this solution is its capacity for processing large amounts of data."
"The solution is very stable."
"I feel the streaming is its best feature."
"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"The main feature that we find valuable is that it is very fast."
"The processing time is very much improved over the data warehouse solution that we were using."
"The features I find most valuable is that the solution is that it is easy to install and to work with. It starts with the installation and from there on the management is very simple and centralized."
"The search function is the most valuable aspect of the solution."
"Provides a viable open-source solution for enterprise implementations and reliable, intelligent data analysis."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there are a lot of things that need to improve. I believe they are working on that."
"In terms of scalability, if you have enough hardware you can scale out. Scalability doesn't have any issues."
"The most valuable feature is Impala, the querying engine, which is very fast."
"We also really like the Cloudera community. You can have any question and will have your answer within a few hours."
"The most valuable feature is Kubernetes."
"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."
"The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"The solution needs to optimize shuffling between workers."
"When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources."
"We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"I would like to see an improvement in how the solution helps me to handle the whole cluster."
"The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better."
"The solution does not support multiple languages very well and this means users need to create work-arounds to implement some solutions."
"We experienced many issues when we started working with Hadoop 3.0 in the Cloudera 6.0 version, so there is a lot of things that need to improve."
"The one thing that we struggled with predominately was support. Because it was relatively new, support was always a big issue and I think it's still a bit of an ongoing concern with the team currently managing it."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"Without the big data environment, we cannot store all of this data live. We have billions of records and terabytes of storage to be used. It's not an option actually for us to have a big data environment."
"The price of this solution could be lowered."
"Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
"When comparing with Oracle Sybase and SQL, it's cheaper. It's not expensive."
"The price could be better for the product."
Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory
Apache Spark is ranked 1st in Hadoop with 13 reviews while Cloudera Distribution for Hadoop is ranked 2nd in Hadoop with 11 reviews. Apache Spark is rated 8.2, while Cloudera Distribution for Hadoop is rated 7.6. The top reviewer of Apache Spark writes "Good Streaming features enable to enter data and analysis within Spark Stream". On the other hand, the top reviewer of Cloudera Distribution for Hadoop writes "Open-source solution for intelligent data management and analysis". Apache Spark is most compared with Spring Boot, Azure Stream Analytics, AWS Batch, SAP HANA and Amazon EMR, whereas Cloudera Distribution for Hadoop is most compared with Amazon EMR, HPE Ezmeral Data Fabric, Cassandra, Hortonworks Data Platform and MongoDB. See our Apache Spark vs. Cloudera Distribution for Hadoop report.
See our list of best Hadoop vendors.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.