2018-08-08T07:09:00Z

What needs improvement with StreamSets?

Miriam Tover - PeerSpot reviewer
  • 0
  • 20
PeerSpot user
17

17 Answers

MB
Real User
Top 20
2023-07-21T08:45:00Z
Jul 21, 2023

StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.

Search for a product comparison
Saket Pandey - PeerSpot reviewer
Real User
Top 5Leaderboard
2023-05-17T11:24:00Z
May 17, 2023

The design or the way they have set up the protocol is pretty good. One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing. It does not have that feature. None of the solutions provides this feature, but this is the feature that we are looking for. If we could bifurcate the data or do manual manipulation of data at any point in time, it would be a game changer. Its initial setup could also be a bit easier.

Avinash Mukesh - PeerSpot reviewer
Real User
Top 5Leaderboard
2023-05-17T11:19:00Z
May 17, 2023

When using Transformer for Snowflake, it's a bit complex to understand the transformation logic. You need someone who has some technical skills to handle it. You need to have some skills to transform the data. However, it's important that Transformer for Snowflake is a serverless engine embedded within the platform, so there is no need for maintenance. Having a serverless engine makes it easy for any enterprise to not think about or worry about the cost of maintaining the software. The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed.

Namanya Brian - PeerSpot reviewer
Real User
Top 10Leaderboard
2023-04-14T09:32:00Z
Apr 14, 2023

Sometimes, it is not clear at first how to set up nodes. A site with an explanation of how each node works would be very helpful. Also, it doesn't provide a very good user experience.

Nantabo Jackie - PeerSpot reviewer
Real User
Top 5Leaderboard
2023-03-24T12:46:00Z
Mar 24, 2023

I identified that if the connection is disconnected and the pipeline is restarted, it sometimes does not reconnect and that has room for improvement. The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base. This leads to discrepancies between the software and the documentation, making it difficult to understand.

Kevin Kathiem Mutunga - PeerSpot reviewer
Real User
Top 10
2023-03-24T12:32:00Z
Mar 24, 2023

There should be a concept of creating double variables because it's still missing. The loading machine mechanism needs to be simplified. Currently, it takes some time to get familiar with and understand that. Visualization and monitoring need to be improved and refined. For example, it is difficult to monitor a job to see what happened in the past seven days when a transfer occurred. The licensing model also has room for improvement. The solution is currently expensive.

Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: March 2024.
765,386 professionals have used our research since 2012.
Reyansh Kumar - PeerSpot reviewer
Real User
Top 5
2023-03-10T04:20:00Z
Mar 10, 2023

The user interface requires some corrections in terms of the menu settings, menu items, and report generation. Also, report generation takes some time.

Ramesh Kuppuswamy - PeerSpot reviewer
Real User
Top 5
2023-01-06T23:33:00Z
Jan 6, 2023

The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information. Apart from that, I don't think much improvement is required, because the software and features are very good.

Sumesh Gansar - PeerSpot reviewer
Real User
Top 5Leaderboard
2023-01-06T22:56:00Z
Jan 6, 2023

In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time. Some visual explanation or some visually appealing knowledge-based content would be very good. That is something that I could have done with, once I started using it, because I found it very difficult.

SR
Real User
Top 5
2023-01-06T22:40:00Z
Jan 6, 2023

In terms of features, I don't have any complaints so far. But one area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there.

TH
Real User
Top 20
2022-12-01T21:40:00Z
Dec 1, 2022

The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best. However, we have a couple of people in-house here who are experts in data analysis and they have figured out how to use this tool. We have to have people who are extremely skilled to go in and write the pipelines for this software because it's so complicated. The software works great for us, but there is an extremely steep learning curve because they don't provide a lot of information outside of paying their ridiculous support costs. Their support starts at $50,000 a year and up. Also, the built-in data drift resilience for ETL operations requires a bunch of custom code development to be able to handle that. It's somewhat difficult because you have to customize it a fair amount. I also would like a more user-friendly interface and better error-trap handling.

Prateek Agarwal - PeerSpot reviewer
Real User
Top 5Leaderboard
2022-08-21T07:36:00Z
Aug 21, 2022

Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using. Also, the hierarchy of names within the dropdowns and the drag-and-drop features are not familiar to users that do not have a technical or programming background. In those cases, the naming conventions are a challenge.

Karthik Rajamani - PeerSpot reviewer
Real User
Top 10
2022-06-14T17:08:00Z
Jun 14, 2022

There are a few things that can be better. We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back. There are certain features that are only available at certain stages. For example, HTTP Client has some great features when it is used as a processor, but those features are not available in HTTP Client as a destination. There could be some improvements on the group side. Currently, if I want to know which users are a part of certain groups, it is not straightforward to see. You have to go to each and every user and check the groups he or she is a part of. They could improve it in that direction. Currently, we have to put in a manual effort. In case something goes wrong, we have to go to each and every user account to check whether he or she is a part of a certain group or not.

SS
Real User
Top 20
2022-06-09T15:40:00Z
Jun 9, 2022

One room for improvement is probably the GUI. It is pretty basic and a lot of improvement is required there. In terms of security, from an architecture perspective, when we want to implement something, and because our organization is very strict when it comes to cybersecurity, we have been struggling a bit because the platform has a few gaps. Those gaps are really gaps based on our organization's requirements. These are not gaps on StreamSets' side. The solution could improve a lot in terms of having more features added to the security model, which would help us. There are quite a few features that we wanted. One is SAP HANA. Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.

AbhishekKatara - PeerSpot reviewer
Real User
Top 10
2022-05-15T09:42:00Z
May 15, 2022

The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time. For example, if I am starting with StreamSets, everything is fine. However, if I want to dig into problems that my pipeline ran into, it initially takes some time to get familiar with it and understand it. I feel the visualization part can be simplified or enhanced a bit, so I can easily see what happened with my job seven days earlier and how many records it transmitted.

MP
Real User
2020-11-19T21:01:53Z
Nov 19, 2020

We've seen a couple of cases where it appears to have a memory leak or a similar problem. It grows for a bit and then we'd have to restart the container, maybe once a month when it gets high.

AC
Real User
2018-08-08T07:09:00Z
Aug 8, 2018

I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks. StreamSets works great for batch processing but we are looking for something that is more real-time. We need latency in numbers below milliseconds.

StreamSets is a data integration platform that enables organizations to efficiently move and process data across various systems. It offers a user-friendly interface for designing, deploying, and managing data pipelines, allowing users to easily connect to various data sources and destinations. StreamSets also provides real-time monitoring and alerting capabilities, ensuring that data is flowing smoothly and any issues are quickly addressed.
Download StreamSets ReportRead more