We performed a comparison between Apache Spark and Pentaho Business Analytics based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"Apache Spark can do large volume interactive data analysis."
"There's a lot of functionality."
"The most valuable feature of Apache Spark is its flexibility."
"The data processing framework is good."
"Features include machine learning, real time streaming, and data processing."
"The product is useful for analytics."
"The most valuable feature of Pentaho is the Tableau report."
"Easy to use components to create the job."
"We were able to install it without any assistance from tech support."
"The initial setup is pretty straightforward."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"The solution needs to optimize shuffling between workers."
"When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"Apache Spark provides very good performance The tuning phase is still tricky."
"Stream processing needs to be developed more in Spark. I have used Flink previously. Flink is better than Spark at stream processing."
"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."
"The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate."
"Version control would be a good addition."
"Deployment is not simple. It is not simple because we are dealing with a lot of data; we are dealing with a lot of storage. So, it's not a simple process."
"The repository should be improved."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
"Pentaho Business Analytics' user interface is outdated."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
"Another concern is that Pentaho is not customizable or interactive."
"Logging capability is needed."
Apache Spark is ranked 1st in Hadoop with 60 reviews while Pentaho Business Analytics is ranked 19th in BI (Business Intelligence) Tools with 42 reviews. Apache Spark is rated 8.4, while Pentaho Business Analytics is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Pentaho Business Analytics writes "Flexible, easy to understand, and simple to set up". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Cloudera Distribution for Hadoop, whereas Pentaho Business Analytics is most compared with Databricks, Microsoft Power BI, KNIME, SAP Crystal Reports and Microsoft SQL Server Reporting Services.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.