We performed a comparison between Pentaho Data Integration and Analytics, SSIS, and StreamSets based on real PeerSpot user reviews.
Find out what your peers are saying about Microsoft, Informatica, Oracle and others in Data Integration."It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
"It's very simple compared to other products out there."
"It has improved our data integration capabilities."
"One of the valuable features is the ability to use PL/SQL statements inside the data transformations and jobs."
"It makes it pretty simple to do some fairly complicated things. Both I and some of our other BI developers have made stabs at using, for example, SQL Server Integration Services, and we found them a little bit frustrating compared to Data Integration. So, its ease of use is right up there."
"It's my understanding that the product can scale."
"The abstraction is quite good."
"I can use Python, which is open-source, and I can run other scripts, including Linux scripts. It's user-friendly for running any object-based language. That's a very important feature because we live in a world of open-source."
"The scalability of SSIS is good."
"Overall, it's a good product."
"SSIS is an easy way to do data integration from various data sources. It doesn't matter whether it's a database, flat files, XML, or Web API. It can talk to the and join them all together."
"The initial setup was easy."
"The most valuable aspect of this solution is that it is simple to use and it offers a flexible custom script task."
"The product's deployment phase is easy."
"The most valuable features of SSIS are that it works with the query language and it can import data from different sources."
"SSIS integrates well with SQL servers and Microsoft products."
"I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
"For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems."
"What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes."
"StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
"The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
"Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now."
"The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems."
"Its basic functionality doesn't need a whole lot of change. There could be some improvement in the consistency of the behavior of different transformation steps. The software did start as open-source and a lot of the fundamental, everyday transformation steps that you use when building ETL jobs were developed by different people. It is not a seamless paradigm. A table input step has a different way of thinking than a data merge step."
"A big problem after deploying something that we do in Lumada is with Git. You get a binary file to do a code review. So, if you need to do a review, you have to take pictures of the screen to show each step. That is the biggest bug if you are using Git."
"The web interface is rusty, and the biggest problem with Pentaho is debugging and troubleshooting. It isn't easy to build the pipeline incrementally. At least in our case, it's hard to find a way to execute step by step in the debugging mode."
"If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was."
"It's not very stable, at least not in the case of the community edition. I'm working with the community edition right now and I think perhaps it is because of that it is not very stable, it causes the system to sometimes hang. I'm not sure if this is the case for pair tiers."
"I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
"I would like to see improvement when it comes to integrating structured data with text data or anything that is unstructured. Sometimes we get all kinds of different files that we need to integrate into the warehouse."
"The product needs more plugins."
"Tuning using this solution requires extensive expertise to improve performance."
"I come from a coding background and this tool is graphically based. Sometimes I think it's cumbersome to do mapping graphically. If there was a way to provide a simple script, it would be helpful and make it easier to use."
"I would also like to see full integration with our BI because then our full load of data will be available in our organization. They should incorporate an ATL process."
"When I compare Talend and SSIS, Talend provides more features. With Talend, we can handle a large volume of data. Talend is usually used to treat a large volume of data, which makes it better than SSIS on the data side. Talend also has a very good Talend Management Console to schedule the jobs and do other things. It can also be easily connected to version control tools such as GitHub or SVN. The last time I used SSIS, it was connected through TSS for the Windows Console version. I am not sure it has been improved or not. If it is not improved, Microsoft should improve it. They should change the product to provide another console."
"We'd like them to develop data exploration more."
"It would be nice if you could run SSIS on other environments besides Windows."
"There are a lot of things that Microsoft could improve in relation to SSIS. One major problem we faced was when attempting to move some Excel files to our SQL Server. The Excel provider has a limitation that prevents importing more than 255 columns from a particular Excel file to the database. This restriction posed a significant issue for us."
"We'd like more integration capabilities."
"There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline."
"The monitoring visualization is not that user-friendly. It should include other features to visualize things, like how many records were streamed from a source to a destination on a particular date."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
"I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
More Pentaho Data Integration and Analytics Pricing and Cost Advice →