SAP Data Hub vs StreamSets comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

SAP Data Hub
Average Rating
7.6
Number of Reviews
3
Ranking in other categories
Data Governance (26th), Metadata Management (11th)
StreamSets
Average Rating
8.4
Number of Reviews
24
Ranking in other categories
Data Integration (8th)
 

Featured Reviews

VM
Sep 22, 2023
The solution is seamless, but the database sometimes leads to confusion
We used to have multiple different kinds of databases, which internally, had different compliance levels. Retention management is very different now. If the policy is live and the claim has been completed, I couldn't archive the claim. I needed to keep a reference integrity of that claim and understand which policy paid out the claim. With this solution, the policy came in six months ago and qualified for archiving. The claim had been paid and in every environment, the claim had been closed, including the reporting system, the claims system, etc. With the payment set gateway, I can just go and archive. But, we had a hard time during this process. I rate the overall solution a seven out of ten.
MI
Mar 17, 2023
It's lightweight and well-integrated, and it saves a lot of money and time
There are so many things that need to be improved. For the StreamSets cloud user interface, there aren't enough use cases and examples for the main problems. In addition, the hybrid data sets cannot be joined in a data connector, which is a significant limitation. There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline. It isn't helpful when you need to apply the same logic for multiple sources. It becomes difficult because you need to create more pipelines and then add coordination between them. Initially, it's hard to find out or master the logic behind it. It can be hard if you aren't technical enough. There is scope for improvement because it's not straightforward. You need to go through the documentation and make sure that you understand every step. For me, it was a challenging model.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"The most valuable feature is the S/4HANA 1909 On-Premise"
"Its connection to on-premise products is the most valuable. We mostly use the on-premise connection, which is seamless. This is what we prefer in this solution over other solutions. We are using it the most for the orchestration where the data is coming from different categories. Its other features are very much similar to what they are giving us in open source. Their push-down approach is the most advantageous, where they push most of the processing on to the same data source. This means that they have a serverless kind of thing, and they don't process the data inside a product such as Data Hub. They process the data from where the data is coming out. If it is coming from HANA, to capture the data or process it for analytics, orchestration, or management, they go to the HANA database and give it out. They don't process it on Data Hub. This push-down approach increases the processing speed a little bit because the data is processed where it is sitting. That's the best part and an advantage. I have used another product where they used to capture the data first and then they used to process it and give it. In Data Hub, it is in reverse. They process it first and give it, and then they put their own manipulations. They lead in terms of business functions. No other solution has business functions already implemented to perform business analysis. They have a lot of prebuilt business functions for machine learning and orchestration, which we can use directly to get an analysis out from the existing data. Most of the data is sitting as enterprise data there. That's a major advantage that they have."
"SAP is one of the most seamless ERPs that have integrated SAP archiving within Excel. I have not seen this with any other database."
"It is a very powerful, modern data analytics solution, in which you can integrate a large volume of data from different sources. It integrates all of the data and you can design, create, and monitor pipelines according to your requirements. It is an all-in-one day data ops solution."
"StreamSets’ data drift resilience has reduced the time it takes us to fix data drift breakages. For example, in our previous Hadoop scenario, when we were creating the Sqoop-based processes to move data from source to destinations, we were getting the job done. That took approximately an hour to an hour and a half when we did it with Hadoop. However, with the StreamSets, since it works on a data collector-based mechanism, it completes the same process in 15 minutes of time. Therefore, it has saved us around 45 minutes per data pipeline or table that we migrate. Thus, it reduced the data transfer, including the drift part, by 45 minutes."
"Also, the intuitive canvas for designing all the streams in the pipeline, along with the simplicity of the entire product are very big pluses for me. The software is very simple and straightforward. That is something that is needed right now."
"The most valuable features are the option of integration with a variety of protocols, languages, and origins."
"In StreamSets, everything is in one place."
"It is really easy to set up and the interface is easy to use."
"I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
"The best feature that I really like is the integration."
 

Cons

"The company has everything offshore."
"In 2018, connecting it to outside sources, such as IoT products or IoT-enabled big data Hadoop, was a little complex. It was not smooth at the beginning. It was unstable. It took a lot of time for the initial data load. Sometimes, the connection broke, and we had to restart the process, which was a major issue, but they might have improved it now. It is very smooth with SAP HANA on-premise system, SAP Cloud Platform, and SAP Analytics Cloud. It could be because these are their own products, and they know how to integrate them. With Hadoop, they might have used open-source technologies, and that's why it was breaking at that time. They are providing less embedded integration because they want us to use their other products. For example, they don't want to go and remove SAP Analytics Cloud and put everything in Data Hub. They want us to use SAP Analytics Cloud somewhere else and not inside the Data Hub. On the integration part, it lacks real-time analytics, and it is slow. They should embed the SAP Analytics Cloud inside Data Hub or support some kind of analysis. They do provide some analysis, but it is not extensive. They are moreover open source. So, we need a lot of developers or data scientists to go in and implement Python algorithms. It would be better if they can provide their own existing algorithms and give some connections and drop-down menus to go and just configure those. It will make things really quick by increasing the embedded integrations. It will also improve the process efficiency and processing power. Its performance needs improvement. It is a little slow. It is not the best in the market, and there are other products that are much better than this. In terms of technology and performance, it is a little slow as compared to Microsoft and other data orchestration products. I haven't used other products, but I have read about those products, their settings, and the milliseconds that they do. In Azure Purview, they say that they can copy, manage, or transform the data within milliseconds. They say that they can transform 100 gigabytes of data within three to five seconds, which is something SAP cannot do. It generally takes a lot of time to process that much amount of data. However, I have never tested out Azure."
"Nowadays there are some inconsistencies in data bases, however, they upgrade and release the versions to market."
"I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
"If you use JDBC Lookup, for example, it generally takes a long time to process data."
"I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions."
"In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time."
"StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target."
"Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"One area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there."
 

Pricing and Cost Advice

"The Cloud is very expensive, but SAP HANA previous service is okay."
"We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
"There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
"StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
"The pricing is good, but not the best. They have some customized plans you can opt for."
"The overall cost for small and mid-size organizations needs to be better."
"It's not expensive because you pay per month, and the tasks you can perform with it are huge. It's reliable and cost-effective."
"StreamSets is expensive, especially for small businesses."
"It has a CPU core-based licensing, which works for us and is quite good."
report
Use our free recommendation engine to learn which Data Governance solutions are best for your needs.
787,383 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Computer Software Company
15%
Manufacturing Company
13%
Financial Services Firm
12%
Government
8%
Financial Services Firm
17%
Computer Software Company
13%
Manufacturing Company
8%
Government
7%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
 

Questions from the Community

What do you like most about SAP Data Hub?
SAP is one of the most seamless ERPs that have integrated SAP archiving within Excel. I have not seen this with any other database.
What needs improvement with SAP Data Hub?
We moved from Oracle. If you're aware of your monitoring system, the RPU market, and the managed system, you should move to HANA, which is an innovative database built by SAP itself. However, this ...
What is your primary use case for SAP Data Hub?
I technically handle the database, like cycle management projects. When transaction data comes in, we see it based on the retention periods. We have to move the data to some secure storage rather t...
What do you like most about StreamSets?
The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customiz...
What needs improvement with StreamSets?
We often faced problems, especially with SAP ERP. We struggled because many columns weren't integers or primary keys, which StreamSets couldn't handle. We had to restructure our data tables, which ...
What is your primary use case for StreamSets?
StreamSets is used for data transformation rather than ETL processes. It focuses on transforming data directly from sources without handling the extraction part of the process. The transformed data...
 

Learn More

Video not available
 

Overview

 

Sample Customers

Kaeser Kompressoren, HARTMANN
Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
Find out what your peers are saying about Microsoft, Collibra, Informatica and others in Data Governance. Updated: June 2024.
787,383 professionals have used our research since 2012.