We performed a comparison between SAP Data Hub and StreamSets based on real PeerSpot user reviews.
Find out what your peers are saying about Microsoft, Collibra, Informatica and others in Data Governance."The most valuable feature is the S/4HANA 1909 On-Premise"
"SAP is one of the most seamless ERPs that have integrated SAP archiving within Excel. I have not seen this with any other database."
"Its connection to on-premise products is the most valuable. We mostly use the on-premise connection, which is seamless. This is what we prefer in this solution over other solutions. We are using it the most for the orchestration where the data is coming from different categories. Its other features are very much similar to what they are giving us in open source. Their push-down approach is the most advantageous, where they push most of the processing on to the same data source. This means that they have a serverless kind of thing, and they don't process the data inside a product such as Data Hub. They process the data from where the data is coming out. If it is coming from HANA, to capture the data or process it for analytics, orchestration, or management, they go to the HANA database and give it out. They don't process it on Data Hub. This push-down approach increases the processing speed a little bit because the data is processed where it is sitting. That's the best part and an advantage. I have used another product where they used to capture the data first and then they used to process it and give it. In Data Hub, it is in reverse. They process it first and give it, and then they put their own manipulations. They lead in terms of business functions. No other solution has business functions already implemented to perform business analysis. They have a lot of prebuilt business functions for machine learning and orchestration, which we can use directly to get an analysis out from the existing data. Most of the data is sitting as enterprise data there. That's a major advantage that they have."
"The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
"The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
"The ability to have a good bifurcation rate and fewer mistakes is valuable."
"In StreamSets, everything is in one place."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too"
"One of the things I like is the data pipelines. They have a very good design. Implementing pipelines is very straightforward. It doesn't require any technical skill."
"StreamSets Transformer is a good feature because it helps you when you are developing applications and when you don't want to write a lot of code. That is the best feature overall."
"The company has everything offshore."
"Nowadays there are some inconsistencies in data bases, however, they upgrade and release the versions to market."
"In 2018, connecting it to outside sources, such as IoT products or IoT-enabled big data Hadoop, was a little complex. It was not smooth at the beginning. It was unstable. It took a lot of time for the initial data load. Sometimes, the connection broke, and we had to restart the process, which was a major issue, but they might have improved it now. It is very smooth with SAP HANA on-premise system, SAP Cloud Platform, and SAP Analytics Cloud. It could be because these are their own products, and they know how to integrate them. With Hadoop, they might have used open-source technologies, and that's why it was breaking at that time. They are providing less embedded integration because they want us to use their other products. For example, they don't want to go and remove SAP Analytics Cloud and put everything in Data Hub. They want us to use SAP Analytics Cloud somewhere else and not inside the Data Hub. On the integration part, it lacks real-time analytics, and it is slow. They should embed the SAP Analytics Cloud inside Data Hub or support some kind of analysis. They do provide some analysis, but it is not extensive. They are moreover open source. So, we need a lot of developers or data scientists to go in and implement Python algorithms. It would be better if they can provide their own existing algorithms and give some connections and drop-down menus to go and just configure those. It will make things really quick by increasing the embedded integrations. It will also improve the process efficiency and processing power. Its performance needs improvement. It is a little slow. It is not the best in the market, and there are other products that are much better than this. In terms of technology and performance, it is a little slow as compared to Microsoft and other data orchestration products. I haven't used other products, but I have read about those products, their settings, and the milliseconds that they do. In Azure Purview, they say that they can copy, manage, or transform the data within milliseconds. They say that they can transform 100 gigabytes of data within three to five seconds, which is something SAP cannot do. It generally takes a lot of time to process that much amount of data. However, I have never tested out Azure."
"Using ETL pipelines is a bit complicated and requires some technical aid."
"The documentation is inadequate and has room for improvement because the technical support does not regularly update their documentation or the knowledge base."
"We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back."
"The design experience is the bane of our existence because their documentation is not the best. Even when they update their software, they don't publish the best information on how to update and change your pipeline configuration to make it conform to current best practices. We don't pay for the added support. We use the "freeware version." The user community, as well as the documentation they provide for the standard user, are difficult, at best."
"The logging mechanism could be improved. If I am working on a pipeline, then create a job out of it and it is running, it will generate constant logs. So, the logging mechanism could be simplified. Now, it is a bit difficult to understand and filter the logs. It takes some time."
"I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks."
"The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
"The data collector in StreamSets has to be designed properly. For example, a simple database configuration with MySQL DB requires the MySQL Connector to be installed."
SAP Data Hub is ranked 25th in Data Governance with 3 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. SAP Data Hub is rated 7.6, while StreamSets is rated 8.4. The top reviewer of SAP Data Hub writes "The solution is seamless, but the database sometimes leads to confusion". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". SAP Data Hub is most compared with Microsoft Purview, SAP Data Services, Alation Data Catalog and Azure Data Factory, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and Oracle GoldenGate.
We monitor all Data Governance reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.