Pentaho Data Integration and Analytics vs StreamSets comparison

Cancel
You must select at least 2 products to compare!
Hitachi Vantara Logo
3,346 views|1,127 comparisons
94% willing to recommend
StreamSets Logo
4,226 views|2,398 comparisons
100% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Pentaho Data Integration and Analytics and StreamSets based on real PeerSpot user reviews.

Find out in this report how the two Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Pentaho Data Integration and Analytics vs. StreamSets Report (Updated: March 2024).
767,847 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side.""We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic.""We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice.""This solution allows us to create pipelines using a minimal amount of custom coding.""The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data.""Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool things is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing.""It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient.""The amount of data that it loads and processes is good."

More Pentaho Data Integration and Analytics Pros →

"The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems.""The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them.""The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows.""StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes.""The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth.""The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too""It is really easy to set up and the interface is easy to use.""StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."

More StreamSets Pros →

Cons
"I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors.""As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows.""I would like to see improvements made for real-time data processing.""If you develop it on MacBook, it'll be quite a hassle.""I would like to see more improvements with AS400 DB2.""One thing that I don't like, just a little, is the backward compatibility.""If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was.""​I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse​."

More Pentaho Data Integration and Analytics Cons →

"The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best.""StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target.""I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions.""If you use JDBC Lookup, for example, it generally takes a long time to process data.""The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information.""Visualization and monitoring need to be improved and refined.""Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful.""One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing."

More StreamSets Cons →

Pricing and Cost Advice
  • "There is a good open source option (Community Edition)​."
  • "The price of the regular version is not reasonable and it should be lower."
  • "Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs."
  • "It does seem a bit expensive compared to the serverless product offering. Tools, such as Server Integration Services, are "almost" free with a database engine. It is comparable to products like Alteryx, which is also very expensive."
  • "I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other thing that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive."
  • "When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho."
  • "The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that."
  • "The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good. At this time, there are no additional costs. We just have the licensing fees."
  • More Pentaho Data Integration and Analytics Pricing and Cost Advice →

  • "We are running the community version right now, which can be used free of charge."
  • "StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
  • "It has a CPU core-based licensing, which works for us and is quite good."
  • "There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
  • "The pricing is good, but not the best. They have some customized plans you can opt for."
  • "We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
  • "The overall cost for small and mid-size organizations needs to be better."
  • "There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced."
  • More StreamSets Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
    767,847 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:Hi Rajneesh yes here is the feature comparison between the community and enterprise edition :… more »
    Top Answer: In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with Hitachi Lumada Data Integration and while it does a good job for what it’s worth, it… more »
    Top Answer:My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job. We were most satisfied with the results in terms of how many people could use… more »
    Top Answer:I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines… more »
    Top Answer:StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data… more »
    Top Answer:We are using StreamSets to migrate our on-premise data to the cloud.
    Ranking
    16th
    out of 100 in Data Integration
    Views
    3,346
    Comparisons
    1,127
    Reviews
    15
    Average Words per Review
    1,193
    Rating
    7.7
    8th
    out of 100 in Data Integration
    Views
    4,226
    Comparisons
    2,398
    Reviews
    21
    Average Words per Review
    1,337
    Rating
    8.4
    Comparisons
    Also Known As
    Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
    Learn More
    StreamSets
    Video Not Available
    Overview

    Pentaho Data Integration stands as a versatile platform designed to cater to the data integration and analytics needs of organizations, regardless of their size. This powerful solution is the go-to choice for businesses seeking to seamlessly integrate data from diverse sources, including databases, files, and applications. Pentaho Data Integration facilitates the essential tasks of cleaning and transforming data, ensuring it's primed for meaningful analysis. With a wide array of tools for data mining, machine learning, and statistical analysis, Pentaho Data Integration empowers organizations to glean valuable insights from their data. What sets Pentaho Data Integration apart is its maturity and a vibrant community of users and developers, making it a reliable and cost-effective option. Pentaho Data Integration offers a range of features, including a comprehensive ETL toolkit, data cleaning and transformation capabilities, robust data analysis tools, and seamless deployment options for data integration and analytics solutions, making it a go-to solution for organizations seeking to harness the power of their data.

    StreamSets is a data integration platform that enables organizations to efficiently move and process data across various systems. It offers a user-friendly interface for designing, deploying, and managing data pipelines, allowing users to easily connect to various data sources and destinations. StreamSets also provides real-time monitoring and alerting capabilities, ensuring that data is flowing smoothly and any issues are quickly addressed.

    Sample Customers
    66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
    Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
    Top Industries
    REVIEWERS
    Healthcare Company19%
    Financial Services Firm19%
    Comms Service Provider11%
    Manufacturing Company11%
    VISITORS READING REVIEWS
    Financial Services Firm19%
    Computer Software Company13%
    Comms Service Provider11%
    Government7%
    REVIEWERS
    Financial Services Firm20%
    Energy/Utilities Company20%
    Comms Service Provider13%
    Computer Software Company13%
    VISITORS READING REVIEWS
    Financial Services Firm17%
    Computer Software Company13%
    Manufacturing Company8%
    Government7%
    Company Size
    REVIEWERS
    Small Business27%
    Midsize Enterprise31%
    Large Enterprise42%
    VISITORS READING REVIEWS
    Small Business21%
    Midsize Enterprise11%
    Large Enterprise68%
    REVIEWERS
    Small Business40%
    Midsize Enterprise12%
    Large Enterprise48%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise11%
    Large Enterprise73%
    Buyer's Guide
    Pentaho Data Integration and Analytics vs. StreamSets
    March 2024
    Find out what your peers are saying about Pentaho Data Integration and Analytics vs. StreamSets and other solutions. Updated: March 2024.
    767,847 professionals have used our research since 2012.

    Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. Pentaho Data Integration and Analytics is rated 8.0, while StreamSets is rated 8.4. The top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and AWS Glue, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and IBM InfoSphere DataStage. See our Pentaho Data Integration and Analytics vs. StreamSets report.

    See our list of best Data Integration vendors.

    We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.