Pentaho Data Integration and Analytics vs SSIS vs StreamSets comparison

Cancel
You must select at least 2 products to compare!
Hitachi Vantara Logo
3,346 views|1,127 comparisons
94% willing to recommend
Microsoft Logo
Read 69 SSIS reviews
19,568 views|15,878 comparisons
79% willing to recommend
StreamSets Logo
4,226 views|2,398 comparisons
100% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Pentaho Data Integration and Analytics, SSIS, and StreamSets based on real PeerSpot user reviews.

Find out what your peers are saying about Microsoft, Informatica, Oracle and others in Data Integration.
To learn more, read our detailed Data Integration Report (Updated: April 2024).
767,847 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient.""It's very simple compared to other products out there.""It has improved our data integration capabilities​.""One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs.""It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there.""It's my understanding that the product can scale.""The abstraction is quite good.""I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source."

More Pentaho Data Integration and Analytics Pros →

"The scalability of SSIS is good.""Overall, it's a good product.""SSIS is an easy way to do data integration from various data sources. It doesn't matter whether it's a database, flat files, XML, or Web API. It can talk to the and join them all together.""The initial setup was easy.""The most valuable aspect of this solution is that it is simple to use and it offers a flexible custom script task.""The product's deployment phase is easy.""The most valuable features of SSIS are that it works with the query language and it can import data from different sources.""SSIS integrates well with SQL servers and Microsoft products."

More SSIS Pros →

"I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally.""For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems.""What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes.""StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved.""StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall.""The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows.""Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now.""The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems."

More StreamSets Pros →

Cons
"Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step.""A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git.""The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode.""If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was.""It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers.""I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors.""I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse.""The product needs more plugins."

More Pentaho Data Integration and Analytics Cons →

"Tuning using this solution requires extensive expertise to improve performance.""I come from a coding background and this tool is graphically based. Sometimes I think it's cumbersome to do mapping graphically. If there was a way to provide a simple script, it would be helpful and make it easier to use.""I would also like to see full integration with our BI because then our full load of data will be available in our organization. They should incorporate an ATL process.""When I compare Talend and SSIS, Talend provides more features. With Talend, we can handle a large volume of data. Talend is usually used to treat a large volume of data, which makes it better than SSIS on the data side. Talend also has a very good Talend Management Console to schedule the jobs and do other things. It can also be easily connected to version control tools such as GitHub or SVN. The last time I used SSIS, it was connected through TSS for the Windows Console version. I am not sure it has been improved or not. If it is not improved, Microsoft should improve it. They should change the product to provide another console.""We'd like them to develop data exploration more.""It would be nice if you could run SSIS on other environments besides Windows.""There are a lot of things that Microsoft could improve in relation to SSIS. One major problem we faced was when attempting to move some Excel files to our SQL Server. The Excel provider has a limitation that prevents importing more than 255 columns from a particular Excel file to the database. This restriction posed a significant issue for us.""We'd like more integration capabilities."

More SSIS Cons →

"There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline.""The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date.""The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time.""I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions.""Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.""If you use JDBC Lookup, for example, it generally takes a long time to process data.""StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target.""I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."

More StreamSets Cons →

Pricing and Cost Advice
  • "There is a good open source option (Community Edition)​."
  • "The price of the regular version is not reasonable and it should be lower."
  • "Sometimes we provide the licenses or the customer can procure their own licenses. Previously, we had an enterprise license. Currently, we are on a community license as this is adequate for our needs."
  • "It does seem a bit expensive compared to the serverless product offering. Tools, such as Server Integration Services, are "almost" free with a database engine. It is comparable to products like Alteryx, which is also very expensive."
  • "I think Lumada's price is fair compared to some of the others, like BusinessObjects, which is was the other thing that I used at my previous job. BusinessObject's price was more reasonable before SAP acquired it. They jacked the price up significantly. Oracle's OBIEE tool was also prohibitively expensive."
  • "When we first started with it, it was much cheaper. It has gone up drastically, especially since Hitachi bought out Pentaho."
  • "The cost of these types of solutions are expensive. So, we really appreciate what we get for our money. Though, we don't think of the solution as a top-of-the-line solution or anything like that."
  • "The pricing has been pretty good. I'm used to using everything open-source or freeware-based. I understand that organizations need to make sure that the solutions are secure, and that's basically where I hit a roadblock in my current organization. They needed to ensure that we had a license and we had a secure way of accessing it so that no outside parties could get access to our data, but in terms of pricing, considering how much other teams are spending on cloud solutions or even their existing solutions, its price point is pretty good. At this time, there are no additional costs. We just have the licensing fees."
  • More Pentaho Data Integration and Analytics Pricing and Cost Advice →

  • "This solution has provided an inexpensive tool, and it is easy to find experienced developers."
  • "My advice is to look at what your configuration will be because most companies have their own deals with Microsoft."
  • "This solution is included with the MSSQL server package."
  • "It would be beneficial if the solution had a less costly cloud offering."
  • "Based on my experience and understanding, Talend comes out to be a little bit expensive as compared to SSIS. The average cost of having Talend with Talend Management Console is around 72K per region, which is much higher than SSIS. SSIS works very well with Microsoft technologies, and if you have Microsoft technologies, it is not really expensive to have SSIS. If you have SQL Server, SSIS is free."
  • "We have an enterprise license for this solution."
  • "It comes bundled with other solutions, which makes it difficult to get the price on the specific product."
  • "All of my clients have this product included as part of their Microsoft license."
  • More SSIS Pricing and Cost Advice →

  • "We are running the community version right now, which can be used free of charge."
  • "StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
  • "It has a CPU core-based licensing, which works for us and is quite good."
  • "There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
  • "The pricing is good, but not the best. They have some customized plans you can opt for."
  • "We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
  • "The overall cost for small and mid-size organizations needs to be better."
  • "There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced."
  • More StreamSets Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
    767,847 professionals have used our research since 2012.
    Comparison Review
    Anonymous User
    Technology has made it easier for businesses to organize and manipulate data to get a clearer picture of what’s going on with their business. Notably, ETL tools have made managing huge amounts of data significantly easier and faster, boosting many organizations’ business intelligence operations There are many third-party vendors offering ETL solutions, but two of the most popular are PowerCenter Informatica and Microsoft SSIS (SQL Server Integration Services). Each technology has its advantages but there are also similarities on how they carry out the extract-transform-load processes and only differ in terminologies. If you’re in the process of choosing ETL tools and PowerCenter Informatica and Microsoft SSIS made it to your shortlist, here is a short comparative discussion detailing the differences between the two, as well as their benefits. Package Configuration Most enterprise data integration projects would require the capacity to develop a solution in one platform and test and deploy it in a separate environment without having to manually change the established workflow. In order to achieve this seamless movement between two environments, your ETL technology should allow the dynamic update of the project’s properties using the content or a parameter file or configuration. Both Informatica and SSIS support this functionality using different methodologies. In Informatica, every session can have more than one source and one or more destination connections. There are… Read more →
    Questions from the Community
    Top Answer:Hi Rajneesh yes here is the feature comparison between the community and enterprise edition :… more »
    Top Answer: In my opinion, the reporting side of this tool needs serious improvements. In my previous company, we worked with… more »
    Top Answer:My company has used this product to transform data from databases, CSV files, and flat files. It really does a good job… more »
    Top Answer:SSIS PowerPack is a group of drag and drop connectors for Microsoft SQL Server Integration Services, commonly called… more »
    Top Answer:The product's deployment phase is easy.
    Top Answer:If you don't want to pay a lot of money, you can go for SSIS, as its open-source version is available. When it comes to… more »
    Top Answer:I really appreciate the numerous ready connectors available on both the source and target sides, the support for various… more »
    Top Answer:StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from… more »
    Top Answer:We are using StreamSets to migrate our on-premise data to the cloud.
    Ranking
    16th
    out of 100 in Data Integration
    Views
    3,346
    Comparisons
    1,127
    Reviews
    15
    Average Words per Review
    1,193
    Rating
    7.7
    2nd
    out of 100 in Data Integration
    Views
    19,568
    Comparisons
    15,878
    Reviews
    35
    Average Words per Review
    471
    Rating
    7.7
    8th
    out of 100 in Data Integration
    Views
    4,226
    Comparisons
    2,398
    Reviews
    21
    Average Words per Review
    1,337
    Rating
    8.4
    Comparisons
    Also Known As
    Hitachi Lumada Data Integration, Kettle, Pentaho Data Integration
    SQL Server Integration Services
    Learn More
    Overview

    Pentaho Data Integration stands as a versatile platform designed to cater to the data integration and analytics needs of organizations, regardless of their size. This powerful solution is the go-to choice for businesses seeking to seamlessly integrate data from diverse sources, including databases, files, and applications. Pentaho Data Integration facilitates the essential tasks of cleaning and transforming data, ensuring it's primed for meaningful analysis. With a wide array of tools for data mining, machine learning, and statistical analysis, Pentaho Data Integration empowers organizations to glean valuable insights from their data. What sets Pentaho Data Integration apart is its maturity and a vibrant community of users and developers, making it a reliable and cost-effective option. Pentaho Data Integration offers a range of features, including a comprehensive ETL toolkit, data cleaning and transformation capabilities, robust data analysis tools, and seamless deployment options for data integration and analytics solutions, making it a go-to solution for organizations seeking to harness the power of their data.

    SSIS is a versatile tool for data integration tasks like ETL processes, data migration, and real-time data processing. Users appreciate its ease of use, data transformation tools, scheduling capabilities, and extensive connectivity options. It enhances productivity and efficiency within organizations by streamlining data-related processes and improving data quality and consistency.

    StreamSets is a data integration platform that enables organizations to efficiently move and process data across various systems. It offers a user-friendly interface for designing, deploying, and managing data pipelines, allowing users to easily connect to various data sources and destinations. StreamSets also provides real-time monitoring and alerting capabilities, ensuring that data is flowing smoothly and any issues are quickly addressed.

    Sample Customers
    66Controls, Providential Revenue Agency of Ro Negro, NOAA Information Systems, Swiss Real Estate Institute
    1. Amazon.com 2. Bank of America 3. Capital One 4. Coca-Cola 5. Dell 6. E*TRADE 7. FedEx 8. Ford Motor Company 9. Google 10. Home Depot 11. IBM 12. Intel 13. JPMorgan Chase 14. Kraft Foods 15. Lockheed Martin 16. McDonald's 17. Microsoft 18. Morgan Stanley 19. Nike 20. Oracle 21. PepsiCo 22. Procter & Gamble 23. Prudential Financial 24. RBC Capital Markets 25. SAP 26. Siemens 27. Sony 28. Toyota 29. UnitedHealth Group 30. Visa 31. Walmart 32. Wells Fargo
    Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
    Top Industries
    REVIEWERS
    Healthcare Company19%
    Financial Services Firm19%
    Comms Service Provider11%
    Manufacturing Company11%
    VISITORS READING REVIEWS
    Financial Services Firm19%
    Computer Software Company13%
    Comms Service Provider11%
    Government7%
    REVIEWERS
    Financial Services Firm23%
    Government8%
    Retailer8%
    Insurance Company8%
    VISITORS READING REVIEWS
    Financial Services Firm17%
    Computer Software Company12%
    Government7%
    Healthcare Company6%
    REVIEWERS
    Financial Services Firm20%
    Energy/Utilities Company20%
    Comms Service Provider13%
    Computer Software Company13%
    VISITORS READING REVIEWS
    Financial Services Firm17%
    Computer Software Company13%
    Manufacturing Company8%
    Government7%
    Company Size
    REVIEWERS
    Small Business27%
    Midsize Enterprise31%
    Large Enterprise42%
    VISITORS READING REVIEWS
    Small Business21%
    Midsize Enterprise11%
    Large Enterprise68%
    REVIEWERS
    Small Business27%
    Midsize Enterprise18%
    Large Enterprise55%
    VISITORS READING REVIEWS
    Small Business18%
    Midsize Enterprise13%
    Large Enterprise69%
    REVIEWERS
    Small Business40%
    Midsize Enterprise12%
    Large Enterprise48%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise11%
    Large Enterprise73%
    Buyer's Guide
    Data Integration
    April 2024
    Find out what your peers are saying about Microsoft, Informatica, Oracle and others in Data Integration. Updated: April 2024.
    767,847 professionals have used our research since 2012.