We performed a comparison between Pentaho Data Integration and Analytics and StreamSets based on real PeerSpot user reviews.
Find out in this report how the two Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Sometimes, it took a whole team about two weeks to get all the data to prepare and present it. After the optimization of the data, it took about one to two hours to do the whole process. Therefore, it has helped a lot when you talk about money, because it doesn't take a whole team to do it, just one person to do one project at a time and run it when you want to run it. So, it has helped a lot on that side."
"We also haven't had to create any custom Java code. Almost everywhere it's SQL, so it's done in the pipeline and the configuration. That means you can offload the work to people who, while they are not less experienced, are less technical when it comes to logic."
"We can schedule job execution in the BA Server, which is the front-end product we're using right now. That scheduling interface is nice."
"This solution allows us to create pipelines using a minimal amount of custom coding."
"The area where Lumada has helped us is in the commercial area. There are many extractions to compose reports about our sales team performance and production steps. Since we are using Lumada to gather data from each industry in each country. We can get data from Argentina, Chile, Brazil, and Colombia at the same time. We can then concentrate and consolidate it in only one place, like our data warehouse. This improves our production performance and need for information about the industry, production data, and commercial data."
"Lumada has allowed us to interact with our employees more effectively and compensate them properly. One of the cool things is that we use it to generate commissions for our salespeople and bonuses for our warehouse people. It allows us to get information out to them in a timely fashion. We can also see where they're at and how they're doing."
"It has a really friendly user interface, which is its main feature. The process of automating or combining SQL code with some databases and doing the automation is great and really convenient."
"The amount of data that it loads and processes is good."
"The ETL capabilities are very useful for us. We extract and transform data from multiple data sources, into a single, consistent data store, and then we put it in our systems. We typically use it to connect our Apache Kafka with data lakes. That process is smooth and saves us a lot of time in our production systems."
"The most valuable feature is the pipelines because they enable us to pull in and push out data from different sources and to manipulate and clean things up within them."
"The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"The entire user interface is very simple and the simplicity of creating pipelines is something that I like very much about it. The design experience is very smooth."
"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
"It is really easy to set up and the interface is easy to use."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
"I have been facing some difficulties when working with large datasets. It seems that when there is a large amount of data, I experience memory errors."
"As far as I remember, not all connectors worked very well. They can add more connectors and more drivers to the process to integrate with more flows."
"I would like to see improvements made for real-time data processing."
"If you develop it on MacBook, it'll be quite a hassle."
"I would like to see more improvements with AS400 DB2."
"One thing that I don't like, just a little, is the backward compatibility."
"If you're working with a larger data set, I'm not so sure it would be the best solution. The larger things got the slower it was."
"I could not connect to our Hadoop environment in an easy and flexible way, and it was important to scale our data warehouse."
"The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best."
"StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
"I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
"Visualization and monitoring need to be improved and refined."
"Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful."
"One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing."
More Pentaho Data Integration and Analytics Pricing and Cost Advice →
Pentaho Data Integration and Analytics is ranked 16th in Data Integration with 48 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. Pentaho Data Integration and Analytics is rated 8.0, while StreamSets is rated 8.4. The top reviewer of Pentaho Data Integration and Analytics writes "It's flexible and can do almost anything I want it to do". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". Pentaho Data Integration and Analytics is most compared with Azure Data Factory, SSIS, Talend Open Studio, Oracle Data Integrator (ODI) and AWS Glue, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and IBM InfoSphere DataStage. See our Pentaho Data Integration and Analytics vs. StreamSets report.
See our list of best Data Integration vendors.
We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.