Apache Spark vs Pentaho Business Analytics comparison

Cancel
You must select at least 2 products to compare!
Apache Logo
2,430 views|1,869 comparisons
89% willing to recommend
Hitachi Vantara Logo
1,134 views|830 comparisons
89% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Apache Spark and Pentaho Business Analytics based on real PeerSpot user reviews.

Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop.
To learn more, read our detailed Hadoop Report (Updated: May 2024).
771,212 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"ETL and streaming capabilities.""We use Spark to process data from different data sources.""AI libraries are the most valuable. They provide extensibility and usability. Spark has a lot of connectors, which is a very important and useful feature for AI. You need to connect a lot of points for AI, and you have to get data from those systems. Connectors are very wide in Spark. With a Spark cluster, you can get fast results, especially for AI.""The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.""The most valuable feature of Apache Spark is its ease of use.""The tool's most valuable feature is its speed and efficiency. It's much faster than other tools and excels in parallel data processing. Unlike tools like Python or JavaScript, which may struggle with parallel processing, it allows us to handle large volumes of data with more power easily.""The most crucial feature for us is the streaming capability. It serves as a fundamental aspect that allows us to exert control over our operations.""I feel the streaming is its best feature."

More Apache Spark Pros →

"The most valuable feature of Pentaho is the Tableau report.""We were able to install it without any assistance from tech support.""Easy to use components to create the job.""I use the BI Server, CDE Dashboards, Saiku, and Kettle, because these tools are very good and highly experienced.""The initial setup is pretty straightforward.""Pentaho is an analytics platform that can be used when an organization has a lot of big data storage systems already installed and needs to manage and analyze that data. It has a specific use case for unstructured data, such as documents, and needs to be able to search and analyze it.""Pentaho Business Analytics' best features include the ease of developing data flows and the wide range of options to connect to databases, including those on the cloud."

More Pentaho Business Analytics Pros →

Cons
"When you are working with large, complex tasks, the garbage collection process is slow and affects performance.""We are building our own queries on Spark, and it can be improved in terms of query handling.""This solution currently cannot support or distribute neural network related models, or deep learning related algorithms. We would like this functionality to be developed.""We've had problems using a Python process to try to access something in a large volume of data. It crashes if somebody gives me the wrong code because it cannot handle a large volume of data.""More ML based algorithms should be added to it, to make it algorithmic-rich for developers.""It needs a new interface and a better way to get some data. In terms of writing our scripts, some processes could be faster.""It requires overcoming a significant learning curve due to its robust and feature-rich nature.""Apache Spark's GUI and scalability could be improved."

More Apache Spark Cons →

"The repository should be improved.""Another concern is that Pentaho is not customizable or interactive.""Logging capability is needed.""Deployment is not simple. It is not simple because we are dealing with a lot of data; we are dealing with a lot of storage. So, it's not a simple process.""We did not achieve the ROI. The work delivered to users had lesser value than the subscription cost.""Version control would be a good addition.""Pentaho Business Analytics' user interface is outdated.""Pentaho, at the general level, should greatly improve the easy construction of its dashboards and easy integration of information from different sources without technical user intervention."

More Pentaho Business Analytics Cons →

Pricing and Cost Advice
  • "Since we are using the Apache Spark version, not the data bricks version, it is an Apache license version, the support and resolution of the bug are actually late or delayed. The Apache license is free."
  • "Apache Spark is open-source. You have to pay only when you use any bundled product, such as Cloudera."
  • "We are using the free version of the solution."
  • "Apache Spark is not too cheap. You have to pay for hardware and Cloudera licenses. Of course, there is a solution with open source without Cloudera."
  • "Apache Spark is an expensive solution."
  • "Spark is an open-source solution, so there are no licensing costs."
  • "On the cloud model can be expensive as it requires substantial resources for implementation, covering on-premises hardware, memory, and licensing."
  • "It is an open-source solution, it is free of charge."
  • More Apache Spark Pricing and Cost Advice →

  • "Free and commercial versions are available."
  • "Pentaho is expensive ."
  • "We were lucky enough to find a Pentaho OEM partner who offered a data warehouse model and the ETL software for about 60K SGD per year."
  • More Pentaho Business Analytics Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Hadoop solutions are best for your needs.
    771,212 professionals have used our research since 2012.
    Comparison Review
    Anonymous User
    Any company (be it technology, manfucaturing, human resource, ecommerce, SME etc) always has the need for Business Intelligence to some or the other extent. If cost is one of the consideration factor, then the 2 BI tools which are at the forefront are Pentaho and Jaspersoft. But, often the same companies are caught up in an imbrogilo as to which tool to use, what are the technology/and end business user wise differences/ do i actually need to purchase commercial edition, is there any work around etc. Differences :- In the below mentioned points, I have tried to cover functionality wise the differences a. Reports :- Jaspersoft is known for its picture pixel perfect reporting. Jasper uses ireport for designing the reports. Hence, for having reports, Jaspersoft is the most ideal candidate. Pentaho uses Pentaho Report Designer. b. Dashboards :- Pentaho provides much more capabililties, interactivity in terms of dashboards. Dashboards designed in Pentaho are far more superior in functionality, aesthetically as compared to Jaspersoft. Pentaho CE uses CDE/CDF, Pentaho EE uses PDD . Dashboard functionality is present only in the Enterprise edition of Jaspersoft. c. Pentaho is having an intermediate layer known as Xactions & hence providing much more flexibility in terms of plugin designing, integration with applications, having out of box experience etc. Xactions supports scripting and scheduling of scripts execution. Jaspersoft dosent provide that much of flexibility in terms of… Read more →
    Questions from the Community
    Top Answer:We use Spark to process data from different data sources.
    Top Answer:In data analysis, you need to take real-time data from different data sources. You need to process this in a subsecond, and do the transformation in a subsecond
    Top Answer:There are many...It would rather depend what System BI architecture or Enterprise legacy you have at your end...I would recommend as follows:  1) If you have legacies of SAP, Oracle  - look for SAP… more »
    Top Answer:The organization has both options based on their needs and budget constraints. The Enterprise Edition is expensive with references to an added number of features.
    Top Answer:The product to me is not as user-friendly as other players in the market. It also still needs improvement in the reporting module. You will need to search for deployment examples or need to have a… more »
    Ranking
    1st
    out of 22 in Hadoop
    Views
    2,430
    Comparisons
    1,869
    Reviews
    26
    Average Words per Review
    444
    Rating
    8.7
    Views
    1,134
    Comparisons
    830
    Reviews
    4
    Average Words per Review
    526
    Rating
    7.3
    Comparisons
    Also Known As
    Pentaho, Kettle, Hitachi Pentaho Business Analytics
    Learn More
    Overview

    Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

    Pentaho is an open source business intelligence company that provides a wide range of tools to help their customers better manage their businesses. These tools include data integration software, mining tools, dashboard applications, online analytical processing options, and more.

    Pentaho has two product categories: There is the standard enterprise version. This is the product that comes directly from Pentaho itself with all of the benefits, features, and programs that come along with a paid application such us analysis services, dashboard design, and interactive reporting.

    The alternative is an open source version, which the public is permitted to add to and tweak the product. This solution has its advantages, aside from the fact that it is free, in that there are many more people working on the project to improve its quality and breadth of functionality.

    Sample Customers
    NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions
    Cargo 2000 Lufthansa, Marketo, ModCloth, Cardiac Science, Telefonica, ExactTarget, Active Broadband Networks, and Brussels Airport.
    Top Industries
    REVIEWERS
    Computer Software Company30%
    Financial Services Firm15%
    University9%
    Marketing Services Firm6%
    VISITORS READING REVIEWS
    Financial Services Firm25%
    Computer Software Company13%
    Manufacturing Company7%
    Comms Service Provider6%
    REVIEWERS
    Computer Software Company19%
    University13%
    Financial Services Firm13%
    Educational Organization6%
    VISITORS READING REVIEWS
    Financial Services Firm23%
    Government12%
    Computer Software Company12%
    Educational Organization8%
    Company Size
    REVIEWERS
    Small Business40%
    Midsize Enterprise18%
    Large Enterprise42%
    VISITORS READING REVIEWS
    Small Business17%
    Midsize Enterprise12%
    Large Enterprise71%
    REVIEWERS
    Small Business50%
    Midsize Enterprise17%
    Large Enterprise33%
    VISITORS READING REVIEWS
    Small Business26%
    Midsize Enterprise13%
    Large Enterprise61%
    Buyer's Guide
    Hadoop
    May 2024
    Find out what your peers are saying about Apache, Cloudera, Amazon Web Services (AWS) and others in Hadoop. Updated: May 2024.
    771,212 professionals have used our research since 2012.

    Apache Spark is ranked 1st in Hadoop with 60 reviews while Pentaho Business Analytics is ranked 19th in BI (Business Intelligence) Tools with 42 reviews. Apache Spark is rated 8.4, while Pentaho Business Analytics is rated 8.0. The top reviewer of Apache Spark writes "Reliable, able to expand, and handle large amounts of data well". On the other hand, the top reviewer of Pentaho Business Analytics writes "Flexible, easy to understand, and simple to set up". Apache Spark is most compared with Spring Boot, AWS Batch, Spark SQL, SAP HANA and Cloudera Distribution for Hadoop, whereas Pentaho Business Analytics is most compared with Microsoft Power BI, Databricks, Microsoft SQL Server Reporting Services, SAP Crystal Reports and KNIME.

    We monitor all Hadoop reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.