Avinash Mukesh - PeerSpot reviewer
IT Specialists at Soft Hostings
Real User
Top 5Leaderboard
User-friendly interface and easy integration, but needs easier transformation logic and faster support
Pros and Cons
  • "It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated."
  • "The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed."

What is our primary use case?

We are sharing data between platforms. It's helping me to be independent of the ETL tools as well as have the data format without using any programming language.

How has it helped my organization?

It's helping us to be more organized. It's a tool that helps a lot in easily extracting data sets from CRM tools, and it can be integrated with external sources to make sure that you are having a good platform. It has improved our organization in the way we perform tests and the way we perform data transfers and streaming.

The data collection process is straightforward and easy. It allows us to move data into modern analytics platforms.

It allows us to build data pipelines without knowing how to code. It allows developers to make sure they are getting the correct data. It works for departments that can code and that can't code. It's a universal tool.

It's very effective. It gives you a clear understanding of the architecture of the data that you have in your company.

StreamSets’ data drift resilience saved us a lot of time. If we were taking seven days previously to build something, now it takes us three days. It has saved about 30% of the time.

It has helped to break down data silos within the organization. It helps to make sure that we are on time with data analysis. It brings efficiency. Overall, it has saved us about 25% of the time.

StreamSets’ reusable assets have helped to reduce workload. There is about a 25% workload reduction.

StreamSets saves us money by not having to hire people with specialized skills. It's saving us 300 USD every month.

StreamSets has helped to scale our data operations. In our business, we process the data the whole time, and we share it with the analytics team to identify and understand what needs to be fixed and what needs to be improved. It's good for our organization.

What is most valuable?

Its user interface is friendly. It's straightforward to implement batch, streaming, or ETL pipelines.

It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated.

What needs improvement?

When using Transformer for Snowflake, it's a bit complex to understand the transformation logic. You need someone who has some technical skills to handle it. You need to have some skills to transform the data. However, it's important that Transformer for Snowflake is a serverless engine embedded within the platform, so there is no need for maintenance. Having a serverless engine makes it easy for any enterprise to not think about or worry about the cost of maintaining the software.

The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed.

Buyer's Guide
StreamSets
May 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
772,679 professionals have used our research since 2012.

For how long have I used the solution?

I've been using StreamSets for three years.

What do I think about the stability of the solution?

It's very stable. It's very hard to find any downtime for the software.

What do I think about the scalability of the solution?

It's scalable enough. It integrates with AWS, Snowflake, Google Cloud, and Azure. It gives you a very good way to process and store your data.

We're using it in multiple departments in the same location. It's being used by the analytics team and our senior developers. There are about 10 people using this solution.

How are customer service and support?

They take a long time to respond to queries, but they are good people. They should improve the time to respond to queries. I'd rate them a six out of ten.

How would you rate customer service and support?

Neutral

Which solution did I use previously and why did I switch?

I didn't use any other solution previously.

How was the initial setup?

Deploying StreamSets is not so complex. It's easy. It takes about three days.

It doesn't require any maintenance from our side.

What about the implementation team?

We have an in-house team of five people.

What was our ROI?

We have seen an ROI. We use data analytics in marketing and knowing where we need to market and where we need to improve, increases our success rate. We have seen about 30% ROI.

What's my experience with pricing, setup cost, and licensing?

It's not expensive because you pay per month, and the tasks you can perform with it are huge. It's reliable and cost-effective.

What other advice do I have?

It's a very good tool if you need to access data from a CRM system, Salesforce, etc. However, it can't be used as an end-to-end integration tool because it lacks certain functionality. It could also be very expensive for small enterprises. 

Overall, I'd rate it a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Software Engineer at ZIDIYO
Real User
Enables us to create streams and pipelines that our analytics team can utilize to identify areas for improvement
Pros and Cons
  • "The UI is user-friendly, it doesn't require any technical know-how and we can navigate to social media or use it more easily."
  • "Using ETL pipelines is a bit complicated and requires some technical aid."

What is our primary use case?

We use StreamSets to create data pipelines and to make sure that we know the exact analytics of our data usage within our company.

How has it helped my organization?

We use StreamSets' ability to connect to enterprise data stores such as Kafka. It is easy and simple to connect enterprise data stores as long as we follow the documentation.

We use StreamSets' ability to move data into the analytic platforms easily because we can use the template provided to extract data from the pipeline.

Being able to use Transformer for Snowflake to design both simple and complex transformation logic is important because it helps us break out a live amount of data interfaces that can be understood by the analytics team and identify areas of improvement. As the Transformer for Snowflake operates as a serverless engine, we can reduce our costs as we no longer need to purchase servers.

StreamSets enables us to create streams and pipelines that our analytics team can utilize to identify areas for improvement. Additionally, our marketing team can leverage the data generated from these reports to understand how we can integrate our products and services to benefit our brand.

StreamSets' data drift resilience is effective and user-friendly. We can use templates or use them from scratch. Data drift resilience saves us around 35 percent of the time fixing duplicates.

StreamSets has helped us break down data silos within our organization by providing a clear path forward and enhancing our productivity by breaking down a large amount of data that we can understand.

StreamSets saved us around 40 percent of our time.

We can use a small team using StreamSets to create data pipelines that would normally require an expert that costs around $500 per month.

StreamSets helps us scale our operations because we understand the quality of the data we have and how we can integrate the data into our marketing needs.

What is most valuable?

The UI is user-friendly, it doesn't require any technical know-how and we can navigate to social media or use it more easily.

What needs improvement?

Using ETL pipelines is a bit complicated and requires some technical aid.

The Transformer for Snowflake functionality is complex and requires a lot of logic.

For how long have I used the solution?

I have been using the solution for three years.

What do I think about the stability of the solution?

The solution is stable with no issues.

What do I think about the scalability of the solution?

The solution is scalable.

How are customer service and support?

The technical support team takes over eight hours to respond to our requests.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup is straightforward. I deployed the solution myself.

What about the implementation team?

The implementation was completed in-house.

What was our ROI?

StreamSets helps us increase our sales by 45 percent.

What's my experience with pricing, setup cost, and licensing?

StreamSets is expensive, especially for small businesses.

What other advice do I have?

I give the solution a nine out of ten.

The solution does not require maintenance from our end.

We have deployed StreamSets across our engineering team, data analytics team, and software development team.

StreamSets is an excellent solution for organizations that have a budget. The solution allows for various streaming capabilities and seamless integration with customer messaging, all within one environment. I highly recommend StreamSets.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
StreamSets
May 2024
Learn what your peers think about StreamSets. Get advice and tips from experienced pros sharing their opinions. Updated: May 2024.
772,679 professionals have used our research since 2012.
Kevin Kathiem Mutunga - PeerSpot reviewer
Chief software engineer at Appnomu Business Services
Real User
Top 10
Enables us to build data pipelines without knowing how to code and helped us break down data silos within our organization
Pros and Cons
  • "The best feature that I really like is the integration."
  • "Visualization and monitoring need to be improved and refined."

What is our primary use case?

In our department, we use StreamSets to design data pipelines that load all data from various RD and VMS sources to the cloud, such as Azure. We also use the data set for data analysts to generate panels for our organization, as well as for real-time use cases for monitoring and consuming other streaming data. Additionally, we are able to customize StreamSets to suit our needs and budget.

How has it helped my organization?

Using StreamSets to create pipelines for batch streaming or ETL is easy and straightforward. However, if one is new to StreamSets, it may not be so simple and may require a lot of documentation for assistance.

We utilize StreamSets' ability to connect to enterprise data stores, making it easy to begin trading instantly without needing to be technically skilled. We use StreamSets to move data into analytics platforms. In my experience, it is initially quite easy to move data back if we have a clear understanding of data transit, importation, and exporting from external sources.

This solution enables us to build data pipelines without knowing how to code. The solution includes templates that guide us and help us customize our data easily. It is essential that StreamSets does not necessitate coding, as this saves a considerable amount of time that would otherwise be spent writing code, as well as resources that would be required to hire experts.

Transformer for Snowflake can help with both simple and complex transformation logic. For example, creating a plan to perform EPL and machine learning operations is easy and fast. However, if the same operations are performed on-site, it can be difficult to troubleshoot events due to limited visibility into the results. StreamSets' Transformer for Snowflake is important to us because it saves us a lot of time and enables us to complete a task remotely with only two or three people.

It is important that Transformer for Snowflake is a serverless engine embedded within the platform. We have the capability of creating a data operations platform, so we don't have to worry or even be aware of what we are doing at the moment. We can simply create a device and use it in the pipeline we want it to be in.

The solution improved the way we work, benefiting both our customers and our development and retainer teams. StreamSets helps us develop a platform manually, with a lot of teamwork, either remotely or on-site, depending on which option we use. This has had a significant impact on our organization in terms of how we process and transform data.

I would say that it is very easy for us to update the template so that we can have real, actual data in APL claims and in the supply chain. StreamSets' data drift resilience is very effective and can run in the data grid. The data drift resilience has reduced the time it takes us to fix data drift breakages by approximately 25 percent.

StreamSets helped us break down data silos within our organization. The ability to break down data silos helps StreamSets to gain quick insights. In general, it is a great feature that ensures we have activities or processes in place. We know precisely what to prevent and what to implement.

StreamSets saved us around 30 percent of our time, meaning that a task that would take five hours to complete manually can now be done in around three and a half hours.

The reusable assets are reducing workload by 35 percent by allowing different people to use a single platform or resource, regardless of whether they have a similar SKU or a different SKU. This feature can help an organization simplify, implement, and transmit more easily.

It is not only the cost of one packet that we paid for, but now we are implementing a strategy using different people within the company. It would be very expensive if we had to hire a new person to manage that task and it would also take a lot of time. StreamSets is not only saving us money, but it is also ensuring that we complete strategies on time.

StreamSets as well helped us scale our operations, which has had a significant impact on our business. We now have a better understanding of how to secure data and provide reliable security for the transmission of data from internal servers to external services, as well as meeting our client's application needs.

What is most valuable?

The best feature that I really like is the integration. The software can be integrated with Azure Keyvault or AWS Secrets Manager, as well as scheduling. It is very easy to schedule an event, which is much easier than I expected through StreamSets. The solution is also fast at determining pipelines. Additionally, I like that StreamSets has many components, such as sources, processes, execution, and other useful elements that I need to plan.

What needs improvement?

There should be a concept of creating double variables because it's still missing.

The loading machine mechanism needs to be simplified. Currently, it takes some time to get familiar with and understand that. 

Visualization and monitoring need to be improved and refined. For example, it is difficult to monitor a job to see what happened in the past seven days when a transfer occurred.

The licensing model also has room for improvement. The solution is currently expensive.

For how long have I used the solution?

I have been using the solution for five years.

What do I think about the stability of the solution?

The solution is stable.

What do I think about the scalability of the solution?

The solution is scalable. We currently have four people using StreamSets in our organization.

How are customer service and support?

The technical support is good and they prioritize issues based on their severity, so sometimes we have to wait a while for a response.

How would you rate customer service and support?

Neutral

How was the initial setup?

The initial setup is a bit complex for first-time people. There is a lot of documentation that needs to be reviewed before deploying. The deployment takes around one month.

What about the implementation team?

The implementation is completed in-house.

What was our ROI?

StreamSets simplified our data ingestion and integration process without the need for the large financial investment that would be required if we were to use other, cheaper solutions. This is due to StreamSets' security and safety in supporting various heterogeneous sources such as RDZMS, and Salesforce. StreamSets ensures that we have a secure and easy way to launch any integration tool, resulting in increased profits. StreamSets is very stable, secure, and compliant, and has yielded a return on investment of around 30 percent.

What's my experience with pricing, setup cost, and licensing?

I believe the pricing is not equitable. Different businesses operate in various models and ways, so I wish StreamSets would be able to adjust their pricing depending on the intended use of the software. This would be beneficial to businesses with limited budgets. Currently, the cost of StreamSets is the same regardless of the amount of backup, which is costly.

What other advice do I have?

I give the solution an eight out of ten. StreamSets still needs to improve the monitoring and visualization before the solution can be a ten out of ten.

Since StreamSets is deployed in the cloud, we don't have any maintenance requirements or costs.

I highly recommend StreamSets; it is an excellent tool with both batch and streaming capabilities. StreamSets is a great option for anyone to try, though it does require an organization to have the budget to use it.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Sumesh Gansar - PeerSpot reviewer
Product Marketing Manager at a tech vendor with 10,001+ employees
Real User
Top 5Leaderboard
We are now able to run pipelines that scale horizontally, improving efficiency and significantly reducing workload
Pros and Cons
  • "For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems."
  • "Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now."
  • "In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."

What is our primary use case?

My primary use case with StreamSets is to integrate large data sets from multiple sources into a destination. We also use it as a platform to ingest data and deliver data for database analytics.

How has it helped my organization?

One major benefit that we have realized with StreamSets is that we are now able to run pipelines that scale horizontally, instead of using a static service to host the service. This has improved efficiency and reduced our workload by around 85 percent. Initially, we started out with around 40 users. Now, there are 100 users. We have definitely scaled up, in terms of usage, with StreamSets.

The fact that it is a single centralized platform saves us a lot of time. It's very intuitive and very effective, saving us a lot of resources with its built-in capabilities. No manual intervention is needed, and nobody needs to oversee it. It's an "all-in-one" deal for us. We are able to save 15 to 18 hours per week. Tasks that required three people can be done with StreamSets itself.

And with its ability to integrate large data sets, we are now able to pull thousands of records instantly, thereby reducing the need to do some complex coding for this asset. That has also been a very big plus for us.

We also use it to connect our Apache Kafka with data lakes and, as a result, this connection has gotten much more efficient and quicker for us. The overall efficiency has also drastically improved for us with this. Connecting these enterprise systems using StreamSets is pretty easy. The StreamSets platform is very straightforward. There is no major coding required, so any non-technical person can also do it.

Without the need for any complex coding at all, we are able to pull records. The records are vast and very large and pulling them usually requires coding, but the fact that there is literally no coding required is a very big plus for us. Once you start to code, there is a lot of time involved and a lot of QA involved, but all of that is eliminated here.

And it has definitely helped us break down data silos. With our large amount of data, we have different data formats, and as a result, there are data silos that are present by default. With StreamSets, we were able to completely eliminate that because StreamSets has become a centralized system for us to accommodate everything. We have been able to get a single, centralized view of all our data.

We have a lot of different data formats, and transforming them manually without any tool or system is a cumbersome and frustrating process. We use StreamSets to do that. It has made that process much more elegant and efficient for us.

What is most valuable?

For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems. 

Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now. 

Apart from that, the user interface of StreamSets is very good. It's very user-friendly and very appealing. Moving data into modern analytics platforms is a very straightforward procedure. There is no difficulty involved in it.

In addition, the ETL capabilities of StreamSets are also very useful for us. We are able to extract and transform data from multiple data sources into a single, consistent data store that is loaded into our target system.

What needs improvement?

In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time.

Some visual explanation or some visually appealing knowledge-based content would be very good. That is something that I could have done with, once I started using it, because I found it very difficult.

For how long have I used the solution?

I have been using StreamSets for about a year.

What do I think about the stability of the solution?

It is definitely a stable product. In fact, it is one of the top products in the market in that particular category. We have not faced any stability issues so far, in terms of server speed, latency, or deployment.

What do I think about the scalability of the solution?

It's a scalable product. In our company, the platform is used across seven teams in our organization.

A couple of more teams are evaluating StreamSets in our organization. They're running things and asking for some feedback from our side as well. There are plans to expand our use of it.

How are customer service and support?

I have been in contact with their technical support and I would rate them very highly. They're very knowledgeable and patient. That is something that I like very much. For a very new user, it's not very easy to understand and we contact the support team over email.

We do have a relationship manager as well, who acts as the central point of contact for us. They're very prompt, knowledgeable, and friendly.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

This was one of the first products we used.

What was our ROI?

Within about three months we were able to see benefits from the system. We saw a lot of time being saved, and about a 30 percent increase in our overall efficiency.

Apart from reducing our workload and improving our efficiency, we saw a 12 percent increase in our revenue last year after we implemented StreamSets. I know people will definitely see a return investment on their money from it.

What's my experience with pricing, setup cost, and licensing?

From what I hear from my team, I believe it's moderately priced because they're happy with the pricing.

What other advice do I have?

Server update maintenance is required, but that is minimal. Any product would require that type of maintenance. I don't think we are investing a lot of time and money in maintenance. The maintenance is just another cost for us. We have only two guys working on the maintenance part of the software.

It's a very intuitive product, modern, and very user-friendly in terms of the UI. Almost all our requirements have been met by StreamSets and we don't have any complaints so far.

I would recommend starting to use it as soon as possible. No tool is perfect. You have to choose the best of the lot. I certainly believe StreamSets is at the top of the ladder when it comes to similar software.

My biggest lesson from using StreamSets is that data integration can be done much more easily now. I only knew that after starting to use StreamSets. When it comes to data integration from multiple sources, and having multiple destinations, people always assume it's a time-consuming, cumbersome project. But once we started using StreamSets, all those assumptions were broken. It's very straightforward and elegant software.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Senior Data Engineer at a energy/utilities company with 1,001-5,000 employees
Real User
Top 20
Quite simple to use for anybody who has an ETL or BI background
Pros and Cons
  • "StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved."
  • "Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."

What is our primary use case?

We are using the StreamSets DataOps platform to ingest data to a data lake.

How has it helped my organization?

Our time to value has increased because our development time has been considerably reduced. The major benefit that we are getting out of the solution is the ability to easily transform and upskill a person who has already worked on an ETL or BI background. We don't need to specifically look for people who know programming or worked on Python, DataOps, or a DevOps sort of functionality. In the market, it is easier to find people with ETL or BI skills than people with hardcore DevOps or programming skills. That is the major benefit that we are getting out of moving to a GUI-based tool like StreamSets. How quickly we are delivering to our customers, as well as our ability to ingest to a data lake, have actually improved a lot by using this tool.

What is most valuable?

The types of the source systems that it can work with are quite varied. There are numerous source systems that it can work with, e.g., a SQL Server database, an Oracle Database, or REST API. That is an advantage we are getting. 

The most important feature is the Control Hub that comes with the DataOps Platform and does load balancing. So, we do not worry about the infrastructure. That is a highlight of the DataOps platform: Control Hub manages the data load to various engines.

It is quite simple for anybody who has an ETL or BI background and worked on any ETL technologies, e.g., IBM DataStage, SAP BODS, Talend, or CloverETL. In terms of experience, the UI and concepts are very similar to how you develop your extraction pipeline. Therefore, it is very simple for anybody who has already worked on an ETL tool set, either for your data ingestion, ETL pipeline, or data lake requirements.

We use StreamSets to load into AWS S3 and Snowflake databases, which are then moved forward by Power BI or Tableau. It is quite simple to move data into these platforms using StreamSets. There are a lot of tools and destination stages within StreamSets and Snowflake, Amazon S3, any database, or an HTTP endpoint. It is just a drag-and-drop feature that is saving a lot of time when rewriting any custom code in Python. StreamSets enables us to build data pipelines without knowing how to code, which is a big advantage.

The data resilience feature is good enough for our ETL operations, even for our production pipelines at this stage. Therefore, we do not need to build our own custom framework for it since what is available out-of-the-box is good enough for a production pipeline.

StreamSets data drift feature gives us an alert upfront so we know that the data can be ingested. Whatever the schema or data type changes, it lands automatically into the data lake without any intervention from us, but then that information is crucial to fix for downstream pipelines, which process the data into models, like Tableau and Power BI models. This is actually very useful for us. We are already seeing benefits. Our pipelines used to break when there were data drift changes, then we needed to spend about a week fixing it. Right now, we are saving one to two weeks. Though, it depends on the complexity of the pipeline, we are definitely seeing a lot of time being saved.

What needs improvement?

One room for improvement is probably the GUI. It is pretty basic and a lot of improvement is required there. 

In terms of security, from an architecture perspective, when we want to implement something, and because our organization is very strict when it comes to cybersecurity, we have been struggling a bit because the platform has a few gaps. Those gaps are really gaps based on our organization's requirements. These are not gaps on StreamSets' side. The solution could improve a lot in terms of having more features added to the security model, which would help us.

There are quite a few features that we wanted. One is SAP HANA. Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.

For how long have I used the solution?

I have been using it for the past 12 months.

What do I think about the stability of the solution?

I have no concerns in terms of the application's core stability. We haven't had any major outages as such, and even if we had one, those were internal and related to our network, proxy, or firewall. As someone who implemented it and has been working on it day in, day out, sometimes 24/7, I am quite confident with the stability of the solution.

As with any application, it requires periodical maintenance, at least to do an upgrade. That maintenance is to simply upgrade the product, and nothing more than that.

What do I think about the scalability of the solution?

A core feature of the DataOps Platform is you can easily scale through engines when you have more pipelines running and data to process. So, if you would need to purchase more engines or cores, it is quite scalable. That is a major advantage that we are getting. 

In the Control Hub Platform, the orchestration and load balancing are quite scalable. You don't need to fiddle with the existing solution. Everything is run on another engine that gets hooked up automatically to Control Hub, which makes it seamless.

There is sort of a developed template out of StreamSets, where you just have one template and can point it to any source system. You can just start ingesting, which has reduced a lot of time in building our new pipelines.

How are customer service and support?

They are quite good and responsive. We have a dedicated support portal for StreamSets. We have authorized members who can raise support tickets using the portal, including myself. They have a quick turnaround with good responses, so we are quite happy as of now. I would rate the technical support between 7.5 and 8 out of 10.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We previously developed our own custom platform. We switched because maintaining a custom platform is difficult. We are not a product team. We are an energy company who services business customers. Therefore, maintaining a custom platform is difficult. Another thing was that the custom platform was written programmatically. So, you need a lot of people who have a programmatic knowledge, both to maintain and use it.

The time to value is quite a critical KPI. Before, when our business needed data quickly on the platform, our previous solutions struggled to get it. Thus, our time to value has improved a lot and our customers are happy because they are able to get the data quickly.

How was the initial setup?

I was there right from the start when they adopted an open-source version. Late last year, we moved to an enterprise version, i.e., the DataOps platform. So, I worked on the 3.2.2 version, and now I am working on the 5.0 version, which is the enterprise license version.

The implementation is straightforward, except for a few hiccups with known network, process, and firewall issues. Other than that, it was a very simple, lean implementation.

Because we had a lot of firewall issues and issues with our optimization, it took probably four weeks for us to get things running. However, if you exclude the issues, it took probably a week to a week and a half to get things up and running.

We are working, as a separate piece of the project, to migrate whatever is running in our existing custom platform to StreamSets. From a certain date, we started to work purely on StreamSets. For any future ingestion requirements, we are using StreamSets DataOps platform. However, the previous platform is inactive at the moment. We are only using it for existing pipelines, and the plan is to migrate them to the DataOps platform this year very soon.

What about the implementation team?

Two people were needed for the deployment of this solution: a cloud engineer and a senior data engineer.

What was our ROI?

First, it has saved us a lot of time because we do not need to come up with our own custom platform, which is a huge expenditure in building and maintaining the custom platform. Second, even if we go for other products in the market, there are lots of gaps with the other products. Even if we picked up another product, we would have to customize it. An off-the-shelf product is not enough to meet our needs. Therefore, StreamSets has definitely helped us in getting the information into our data lake very quickly, in terms of ingestion.

The most important thing is it has helped us from a resourcing point of view. You can easily upskill a BI or ETL resource without any programming knowledge to work with this. That is a major advantage that we are getting since we have a lot of ETL people who do not have programming knowledge. They have vast ETL experience working with GUI-based tools, and StreamSets is really useful for them.

It has drastically reduced the time that we are spending on workloads by 60% to 70% as well as reducing the time spent on ingestion by 30%. 

What's my experience with pricing, setup cost, and licensing?

It has a CPU core-based licensing, which works for us and is quite good.

Which other solutions did I evaluate?

We did evaluate other solutions. It was not a quick decision for us to take this product. We evaluated other products in the market, but they were not close to StreamSets or not in the data integration space. One thing that caught our attention with StreamSet was the processes that it could work with. Secondly, the Control Hub DataOps platform manages the load balancing, etc. We were quite interested in that since we would not need to maintain it ourselves. The third most important thing was that you can create job templates in StreamSets. So, this means you create a template for a particular type of ingestion. Going forward, you just change the parameters, then you can point it to any source. This means there is less pipeline development and we can quickly ingest data into the data lake. Those are the features that we were interested in and why we switched StreamSets.

There is actually a gap in the entire data integration market at the moment, and StreamSets Data Collector is trying to fill that gap. The reason is because most data ingestion has to be done through programming languages, like Python or Java. We currently do not have a GUI-based tool set that is as robust as StreamSets. That is what I found out in the lab over the last 12 months. There are new products coming up, but it will still be a few more years until they are stabilized. Whereas, StreamSets is already there to solve your immediate data ingestion requirements. 

What other advice do I have?

Every tool in the market at the moment has some major gaps, especially for large enterprises. It could be the way that the data or pipeline is secured. At present, StreamSets looks like the market leader and is trying to fill that gap. For anyone going through a proof of concept for various tools, StreamSets is almost at the top. I don't think that they need to look any further.

We are working only with API, a relational database management system, and our enterprise warehouses at the moment. We are not using any streaming sort of ingestion at the moment.

We are not using Snowflake Transformer yet. It just got released. We are using a traditional Snowflake destination stage because our enterprise is huge. We have our own Snowflake architecture. We load the security in the data into our own databases using the destination stage, not Transformer yet.

I would rate the solution as 7.5 out of 10.

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Director Data Engineering, Governance, Operation and Analytics Platform at a financial services firm with 10,001+ employees
Real User
Top 20
Ease of configuring and managing pipelines centrally
Pros and Cons
  • "I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
  • "StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."

What is our primary use case?

We are using StreamSets to migrate our on-premise data to the cloud.

What is most valuable?

I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally. It's like a plug-and-play setup.

What needs improvement?

StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data rules. Then, based on the failure of data quality assessment, be able to send alerts or information to help people understand the data validation issues.

For how long have I used the solution?

I have been using StreamSets for a year and a half. 

What do I think about the stability of the solution?

It's reasonably stable.

What do I think about the scalability of the solution?

It's reasonably easy to scale. Around 25 to 30 end users are using this solution in our organization.

How are customer service and support?

Customer service and support are good. 

How would you rate customer service and support?

Positive

How was the initial setup?

It's reasonably easy to deploy. However, since it is used at an enterprise level, it requires maintenance. So we had a maintenance contract. 

In the financial industry, we have very strict regulations around deploying something in the cloud. So, it requires a lot of permission and other processes.

Just one person is enough for the maintenance. 

What's my experience with pricing, setup cost, and licensing?

The pricing was reasonably economical and easy for us to afford when we engaged with StreamSets. It was not part of Software AG at that time.

What other advice do I have?

It's a very good tool. Overall, I would rate the solution an eight out of ten. 

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
PeerSpot user
Saket Pandey - PeerSpot reviewer
Product Manager at a hospitality company with 51-200 employees
Real User
Top 5Leaderboard
Provides a good bifurcation rate and accuracy, and saves time and money
Pros and Cons
  • "The ability to have a good bifurcation rate and fewer mistakes is valuable."
  • "One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing."

What is our primary use case?

We were receiving data from hospitals or any kind of healthcare service providers in the country. We were dominantly operating in the US. When we received that data, we had to classify it into different repositories or different datasets. This data was sent to different vendors, and for that, the data needed to get processed in different ways. We needed to bifurcate data at many steps with different kinds of filters. For that, we used StreamSets.

How has it helped my organization?

We could bifurcate the datasets that we received from different hospitals. We could bifurcate it on the basis of the medical requirements of the hospitals, and sometimes, on the basis of the schedule or purpose. We were obtaining data that we could then supply to some consulting firms or other sources.

StreamSets saved us time. The accuracy was pretty good, and it was definitely better than what we were using previously. Earlier, we had hired two people who were doing the job manually, and we were also using some other platform. We had to pay for them. Overall, we have saved a lot of time, and the accuracy has improved as well. We didn't calculate the time savings, but I believe we saved about three days in a week, so there were about 30% to 40% time savings.

StreamSets reduced the workload. There was a 10% to 15% reduction in the workload.

StreamSets helped us to scale our data operations. The limit at which we purchased this solution was incredible. We were never able to reach the limit that we purchased, but it helped us to increase or scale our operation. Especially in months when we received a higher number of entries, we were able to perform our work on time.

What is most valuable?

The ability to have a good bifurcation rate and fewer mistakes is valuable. In the scenario we had, when we had to bifurcate the data, we did not completely cut the data. We made a different route for one set of data, which went into a different operating system. There was also a complete set of data along with the original data that got cut, which once again went through the filtration process, and in this way, it kept on happening. Different solutions that were in place were not providing this feasibility. With the other solutions that we were using earlier, we had to reuse the data again and again from the start. It was a time-taking process.

Their support system was pretty good. When we were setting up the bifurcation protocols that we wanted to set up, we had a few support calls with them, and those were really helpful.

What needs improvement?

The design or the way they have set up the protocol is pretty good. One thing that I would like to add is the ability to manually enter data. The way the solution currently works is we don't have the option to manually change the data at any point in time. Being able to do that will allow us to do everything that we want to do with our data. Sometimes, we need to manually manipulate the data to make it more accurate in case our prior bifurcation filters are not good. If we have the option to manually enter the data or make the exact iterations on the data set, that would be a good thing. It does not have that feature. None of the solutions provides this feature, but this is the feature that we are looking for. If we could bifurcate the data or do manual manipulation of data at any point in time, it would be a game changer. 

Its initial setup could also be a bit easier.

For how long have I used the solution?

I used this solution for about a year.

What do I think about the stability of the solution?

It's a stable product. We used it for about a year, and we hardly had to shut it down.

What do I think about the scalability of the solution?

We are a medium enterprise. We only have three departments in our company, and only one of the departments is using it. Salespeople don't use it. The development people don't use it. We are the ones using it, and our job is to process the information, so only one department is using the solution. We have about 18 people in the department.

Up to medium enterprises, it's a good choice. You can scale between one million to ten million data files. I don't believe they offer the service for a hundred million or one billion datasets. It isn't too scalable for large enterprises, but for small and medium enterprises, it's good.

How are customer service and support?

I'd rate them an eight out of ten. The only reason for not giving them a ten out of ten is that if you're doing very important work and you need to get the solution the same day, it's a bit tough to have the team support you in a very short period of time. They usually give you appointments about a day or two days later. Other than that, everything is good.

How would you rate customer service and support?

Positive

Which solution did I use previously and why did I switch?

We were using another solution previously. The major reason for switching to StreamSets was that we needed to scale our operations. Our prior solution could have been scaled, but the cost of scaling was a bit higher. We would have had to hire one more person to be able to scale, but we did not want to hire more people, so we decided to use a completely automated solution for this part so that it could be handled by only one of our team members. That was the primary requirement. The cost-benefit analysis was done by one of our peers. His proposal was pretty good, and everyone agreed to it.

How was the initial setup?

Its initial setup is a bit tough. You need to have the technical expertise to do that. The support team is good. They help you around, but if they could make it a bit easier, it would be better.

I believe it operates only from the cloud. We also received the data from our associations on the cloud. We processed it on the cloud, and everything happened on the cloud.

The initial setup was complex because we were not able to directly link the data we were receiving with the StreamSets solution. Linking it required us to fill in or enter some information in StreamSets, but we were not able to figure out what to enter. For that part, we needed their help.

We spent about a week. For the first three days, our team members were trying their best to do it, but then we had to schedule a meeting with them. In terms of the number of people, only one person was working with our team, and there were three people working with the product. I was also involved in the product as a product manager, but I was not directly operating that system.

It didn't require any maintenance as such. Any maintenance activities were related to our side of things. There were mistakes on our end. When we were entering different data, we had to do different configurations in the system.

What was our ROI?

We did the cost-benefit analysis before buying the solution, and it performed even better than that. We were able to replace two of our staff members who were doing this work. The cost that we paid for this solution was pretty less as compared to their salaries, so on the cost-benefit side of things, it was a good deal. We saved about two persons' manual wage, which is about $6,000 a month, and we also saved 15% of a week's time. These two were the biggest returns on the investment. The accuracy was also a bit higher.

What's my experience with pricing, setup cost, and licensing?

Its pricing is pretty much up to the mark. For smaller enterprises, it could be a big price to pay at the initial stage of operations, but the moment you have the Seed B or Seed C funding and you want to scale up your operations and aren't much worried about the funds, at that point in time, you would need a solution that could be scaled. Simultaneously, you need a solution that you don't want to use on a very long-term basis. This solution could not be applied if we were operating with all the hospital chains in the US. We were operating just with one hospital. That's why it worked pretty well, so for medium enterprises, I believe it's very good.

What other advice do I have?

To those evaluating StreamSets, I'd advise doing a cost-benefit analysis because the way of using StreamSets differs from person to person. Someone else might have a very different use case, and they may not run into profit using the solution. For us, it was a good solution because we were hiring people for this work. People were doing the job manually. We saved both time and money, so doing a cost-benefit analysis would be the best thing.

If you are looking to expand your domain or range of operations, StreamSets is very helpful. If you are just looking for a better data analytics tool that can do bifurcation on data, I believe there are other tools or services available in the market that do not focus on the expansion of operations. They focus on doing better and more complex bifurcations. 

StreamSets enables you to build data pipelines without knowing how to code. After generating a few responses, you have to enter some basic syntax or code, but generally, one can do a lot of no-code stuff, which was not an important aspect for us because we were operating in the IT space, and our entire team was capable of entering all the syntaxes that were required. It was not an issue for us at any point in time. In fact, in the operations that we were performing, we only used code. When we were testing out our initial datasets, we used some no-code features that were there, but at the later stage, we used only syntaxes.

We did not connect to the messaging systems, but we connected some enterprise databases. We were operating with a set of hospitals in the US, and we had to connect with them only the first time. Afterward, it was the data that was passing through the pipeline. Initially, for a completely new user, it's a bit tricky. Some technical expertise is required. It's a bit tough, but because the support team is there, one would be able to do it.

Overall, I would rate StreamSets an eight out of ten.

Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Al Mercado - PeerSpot reviewer
AI Engineer at Techvanguard
Real User
A no-code solution with a drag-and-drop UI, but the execution engine should be better
Pros and Cons
  • "The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
  • "The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that."

What is our primary use case?

I was working on an integration project where I was using the StreamSets platform. I was looking at both their data collector and their transformer. The idea was to integrate it with AWS SageMaker Canvas. Both of them are what they call no-code options. StreamSets is for data pipelining, managing your data flow, and transforming your data. SageMaker is AWS, and Canvas is basically their no-code option for machine learning.

I was trying to connect it to a data object repository. For AWS, that's a specific managed service called S3. I wasn't trying to run it with a data warehouse.

How has it helped my organization?

It's still in the trial stage. I don't get a 30-day trial period or anything like that. I just got to write about what's involved and then see if that's something that justifies the use case for going ahead and purchasing the license for it.

It enables you to build data pipelines without knowing how to code. It abstracts away the need for Spark or anything like that. This ability is highly important because it reduces development time.

It saves time because you don't have to write code. 

It saves money by not having to hire people with specialized skills. You don't need Spark or anything like that for doing the same thing.

It helps to scale your data operations. You can get to the execution engine and provision bigger machines or bigger clusters. You can scale out to however much data you need to scale out to.

What is most valuable?

The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows.

What needs improvement?

The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that.

It can break down data silos within the organization. One person can do the whole thing with StreamSets and SageMaker Canvas, but it hasn't yet had any effect on our operations or business because it's one of those situations where you can either get a demo from them or you basically have to go to one of these sessions and they give you temporary credentials and try to work with your use case. Personally, I would change their model a bit and give a two-week trial license for a cloud platform at the very least. You can then try to get something to work or call up their technical department and say, "Look, I've been evaluating this thing for the last few days. I don't know exactly how to resolve this issue."

For how long have I used the solution?

I started using it in June of this year. 

What do I think about the stability of the solution?

The whole issue of the execution engine needs to be better resolved. If you pick a cloud, why isn't it working with this cloud? Or what do I need to do to get it to work with one specific cloud service if it can be deployed across multiple clouds?

What do I think about the scalability of the solution?

It seems pretty highly scalable to me. That's not going to be an issue. Just the administration of it could be an issue.

It's currently being used in a dev department for machine learning. It's being used by the business analyst team.

How are customer service and support?

I haven't contacted their support.

Which solution did I use previously and why did I switch?

AWS has native solutions. There are AWS Data Wrangler and others that come bundled with their services, like AWS Glue. We haven't yet switched to StreamSets. It's still in the evaluation stage, but the no-code and the drag-and-drop option with a GUI are some of the things that seem to resonate with people. 

How was the initial setup?

I was involved in its setup. I was the one who basically had to try to get it to run with whatever process or custom processor I developed. 

It was complex to set up. I had to go to the sessions. On a couple of occasions, I was doing it directly from the cloud platform, and apparently, that wasn't the way to do it. You have to go through their universal designer platform first. 

In terms of maintenance, once you're deployed from the cloud, that's all handled for you. It's managed for you directly from the cloud service. So, you don't have to worry about that. They maintain their design platform.

What about the implementation team?

I didn't use any consultant.

What's my experience with pricing, setup cost, and licensing?

I didn't get into that with the StreamSets representative. It seems to be pay-as-you-go, but I don't know exactly how they do it.

Which other solutions did I evaluate?

Alteryx is another option. It's a similar tool, and it looks almost the same as StreamSets. Alteryx is something that's available for any cloud. It doesn't matter which cloud. You go on the various clouds, and you look and see what they have.

What other advice do I have?

To those evaluating this solution, I would advise looking into how it integrates with the cloud service that they're going to try it with. Does it naturally integrate better with AWS or Azure? It's one of those situations.

I used StreamSets' ability to move data into a modern analytics platform. That's what the AWS SageMaker Canvas is. It's like predictive analytics. In terms of ease of moving data into this analytics platform, doing the design on the StreamSets platform is one thing, but having the execution engine and getting that provision is a totally different ball game. Basically, that's where its limitation comes in.

Overall, I would rate it a seven out of ten. The issue that was never resolved for me was if you're running a compute or execution engine on AWS versus Azure versus GCP, how does that integration work because that has got nothing to do with StreamSets? That is outside of StreamSets. You're now dealing with the cloud service, and there's a good reason for that.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: PeerSpot contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
PeerSpot user
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros sharing their opinions.
Updated: May 2024
Product Categories
Data Integration
Buyer's Guide
Download our free StreamSets Report and get advice and tips from experienced pros sharing their opinions.