We performed a comparison between Apache Spark and Pentaho Business Analytics based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The distribution of tasks, like the seamless map-reduce functionality, is quite impressive."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI."
"The features we find most valuable are the machine learning, data learning, and Spark Analytics."
"I feel the streaming is its best feature."
"The solution has been very stable."
"The product’s most valuable features are lazy evaluation and workload distribution."
"The most valuable feature of Apache Spark is its ease of use."
"The initial setup is pretty straightforward."
"Easy to use components to create the job."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"We were able to install it without any assistance from tech support."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"The most valuable feature of Pentaho is the Tableau report."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"More ML based algorithms should be added to it, to make it algorithmic-rich for developers."
"Apache Spark could improve the connectors that it supports. There are a lot of open-source databases in the market. For example, cloud databases, such as Redshift, Snowflake, and Synapse. Apache Spark should have connectors present to connect to these databases. There are a lot of workarounds required to connect to those databases, but it should have inbuilt connectors."
"When using Spark, users may need to write their own parallelization logic, which requires additional effort and expertise."
"Needs to provide an internal schedule to schedule spark jobs with monitoring capability."
"They could improve the issues related to programming language for the platform."
"We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time."
"Version control would be a good addition."
"Pentaho Business Analytics' user interface is outdated."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
"Logging capability is needed."
"Deployment is not simple. It is not simple because we are dealing with a lot of data; we are dealing with a lot of storage. So, it's not a simple process."
"Another concern is that Pentaho is not customizable or interactive."
"The repository should be improved."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
Apache Spark is ranked 1st in Hadoop with 60 reviews while Pentaho Business Analytics is ranked 19th in BI (Business Intelligence) Tools with 42 reviews. Apache Spark is rated 8.4, while Pentaho Business Analytics is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Pentaho Business Analytics writes "Flexible, easy to understand, and simple to set up". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Cloudera Distribution for Hadoop, whereas Pentaho Business Analytics is most compared with Microsoft Power BI, Databricks, KNIME, SAP Crystal Reports and Microsoft SQL Server Reporting Services.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.