We performed a comparison between Apache Spark and Pentaho Business Analytics based on real PeerSpot user reviews.
Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop."We use it for ETL purposes as well as for implementing the full transformation pipelines."
"The product’s most valuable feature is the SQL tool. It enables us to create a database and publish it."
"It's easy to prepare parallelism in Spark, run the solution with specific parameters, and get good performance."
"The good performance. The nice graphical management console. The long list of ML algorithms."
"The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily."
"I found the solution stable. We haven't had any problems with it."
"The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics."
"ETL and streaming capabilities."
"I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced."
"The initial setup is pretty straightforward."
"The most valuable feature of Pentaho is the Tableau report."
"We were able to install it without any assistance from tech support."
"Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."
"Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it."
"Easy to use components to create the job."
"I would like to see integration with data science platforms to optimize the processing capability for these tasks."
"We are building our own queries on Spark, and it can be improved in terms of query handling."
"They could improve the issues related to programming language for the platform."
"When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources."
"It should support more programming languages."
"It requires overcoming a significant learning curve due to its robust and feature-rich nature."
"Apache Spark's GUI and scalability could be improved."
"Apache Spark is very difficult to use. It would require a data engineer. It is not available for every engineer today because they need to understand the different concepts of Spark, which is very, very difficult and it is not easy to learn."
"Version control would be a good addition."
"We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost."
"Deployment is not simple. It is not simple because we are dealing with a lot of data; we are dealing with a lot of storage. So, it's not a simple process."
"Another concern is that Pentaho is not customizable or interactive."
"Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."
"Logging capability is needed."
"The repository should be improved."
"Pentaho Business Analytics' user interface is outdated."
Apache Spark is ranked 1st in Hadoop with 60 reviews while Pentaho Business Analytics is ranked 19th in BI (Business Intelligence) Tools with 42 reviews. Apache Spark is rated 8.4, while Pentaho Business Analytics is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Pentaho Business Analytics writes "Flexible, easy to understand, and simple to set up". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Cloudera Distribution for Hadoop, whereas Pentaho Business Analytics is most compared with Microsoft Power BI, Databricks, KNIME, SAP Crystal Reports and Microsoft SQL Server Reporting Services.
We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.