We performed a comparison between Apache Spark and Pentaho Business Analytics based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."The main feature that we find valuable is that it is very fast."
"Apache Spark provides a very high-quality implementation of distributed data processing."
"The most valuable feature of Apache Spark is its ease of use."
"The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly."
"The processing time is very much improved over the data warehouse solution that we were using."
"Spark can handle small to huge data and is suitable for any size of company."
"We use Spark to process data from different data sources."
"The good performance. The nice graphical management console. The long list of ML algorithms."
"The most valuable feature of Pentaho is the Tableau report."
"We were able to install it without any assistance from tech support."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
"Easy to use components to create the job."
"The initial setup is pretty straightforward."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"The solution must improve its performance."
"This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed."
"When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources."
"Apart from the restrictions that come with its in-memory implementation. It has been improved significantly up to version 3.0, which is currently in use."
"It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster."
"Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing."
"There were some problems related to the product's compatibility with a few Python libraries."
"It should support more programming languages."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
"Deployment is not simple. It is not simple because we are dealing with a lot of data; we are dealing with a lot of storage. So, it's not a simple process."
"The repository should be improved."
"Logging capability is needed."
"Version control would be a good addition."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
"Pentaho Business Analytics' user interface is outdated."
"Another concern is that Pentaho is not customizable or interactive."
Apache Spark is ranked 1st in Hadoop with 60 reviews while Pentaho Business Analytics is ranked 21st in BI (Business Intelligence) Tools with 42 reviews. Apache Spark is rated 8.4, while Pentaho Business Analytics is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Pentaho Business Analytics writes "Flexible, easy to understand, and simple to set up". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Cloudera Distribution for Hadoop, whereas Pentaho Business Analytics is most compared with Microsoft Power BI, Databricks, Microsoft SQL Server Reporting Services, SAP Crystal Reports and Tableau.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.