We performed a comparison between Apache Spark, Cloudera Distribution for Hadoop, and HPE Ezmeral Data Fabric based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."With Spark, we parallelize our operations, efficiently accessing both historical and real-time data."
"The solution is very stable."
"The good performance. The nice graphical management console. The long list of ML algorithms."
"With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware."
"Its scalability and speed are very valuable. You can scale it a lot. It is a great technology for big data. It is definitely better than a lot of earlier warehouse or pipeline solutions, such as Informatica. Spark SQL is very compliant with normal SQL that we have been using over the years. This makes it easy to code in Spark. It is just like using normal SQL. You can use the APIs of Spark or you can directly write SQL code and run it. This is something that I feel is useful in Spark."
"The main feature that we find valuable is that it is very fast."
"It provides a scalable machine learning library."
"The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics."
"The solution is reliable and stable, it fits our requirements."
"With a cluster available, you can manage the security layer using the shared SDX - it provides flexibility."
"The tool can be deployed using different container technologies, which makes it very scalable."
"The search function is the most valuable aspect of the solution."
"The scalability of Cloudera Distribution for Hadoop is excellent."
"We're now able to store large volumes of data through Cloudera Distribution for Hadoop. We're able to push large volumes of data to the platform, and that used to be a challenge, especially when storing a terabyte of information. This is the area where Cloudera Distribution for Hadoop improved the organization."
"Customer service and support were able to fix whatever the issue was."
"It is helpful to gather and process data."
"HPE Ezmeral Data Fabric can be accessed from any namespace globally as you would access it from a machine using an NFS."
"The model creation was very interesting, especially with the libraries provided by the platform."
"I like the administration part."
"My customers find the product cheaper compared to other solutions. The previous solution that we used did not have unified analytics like the runtime or the analog."
"It is a stable solution...It is a scalable solution."
"The initial setup was not easy."
"One limitation is that not all machine learning libraries and models support it."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"I know there is always discussion about which language to write applications in and some people do love Scala. However, I don't like it."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"Technical expertise from an engineer is required to deploy and run high-tech tools, like Informatica, on Apache Spark, making it an area where improvements are required to make the process easier for users."
"There is a maximum of a one-gigabyte block size, which is an area of storage that can be improved upon."
"The Cloudera training has deteriorated significantly."
"Cloudera's support is extremely bad and cannot be relied on."
"It would be useful if Cloudera had more tools like SQL Engines that offer the traditional relational database. We have to do a lot of work preparing the data outside Cloudera before getting it into the platform."
"It could be faster and more user-friendly."
"The areas of improvement depend on the scale of the project. For banking customers, security features and an essential budget for commercial licenses would be the top priority. Data regulation could be the most crucial for a project with extensive data or an extra use case."
"The solution is not fit for on-premise distributions."
"The user infrastructure and user interface needs to be improved, as well as the performance. The GUI needs to be better."
"Having the ability to extend the services provided by the platform to an API architecture, a micro-services architecture, could be very helpful."
"Upgrading Ezmeral to a new version is a pain. They're trying to make the solution more container-friendly, so I think they're going in the right direction. The only problem we've had in the past was the upgrades. The process isn't smooth due to how the Red Hat operating system upgrades currently work."
"HPE Ezmeral Data Fabric is not compatible with third-party tools."
"The deployment could be faster. I want more support for the data lake in the next release."
"The product is not user-friendly."
More Cloudera Distribution for Hadoop Pricing and Cost Advice →