IBM InfoSphere DataStage vs SAP Data Hub comparison

 

Comparison Buyer's Guide

Executive Summary
 

Categories and Ranking

IBM InfoSphere DataStage
Average Rating
7.8
Number of Reviews
37
Ranking in other categories
Data Integration (7th)
SAP Data Hub
Average Rating
7.6
Number of Reviews
3
Ranking in other categories
Data Governance (26th), Metadata Management (11th)
 

Market share comparison

As of June 2024, in the Data Integration category, the market share of IBM InfoSphere DataStage is 6.6% and it increased by 6.5% compared to the previous year. The market share of SAP Data Hub is 0.6% and it decreased by 11.6% compared to the previous year. It is calculated based on PeerSpot user engagement data.
Data Integration
Unique Categories:
No other categories found
Data Governance
1.1%
Metadata Management
0.9%
 

Featured Reviews

Murali B - PeerSpot reviewer
Mar 28, 2024
Facilitated our peak data integration projects, offers good GUI and availability of connectors is strong
DataStage facilitated our peak data integration projects. For example, big data integrations have happened, particularly when we worked with BigQuery files... that integration server. DataStage parallel processing capabilities have improved data tasks. When I worked with DataStage, it could handle around two terabytes of data. We have other appliances as well, but we're processing data concurrently. It was good. My team supported it well, and everything worked fine. The GUI was good. Compared to Cloud Pak for Data, we have some enhanced connectors in the standard InfoSphere DataStage version. That standard version is really good; it's easy to use. When we want to find out the absolute quality of data, the governance features really helped. For example, when we tried to identify discrepancies between systems, it worked well.
VM
Sep 22, 2023
The solution is seamless, but the database sometimes leads to confusion
We used to have multiple different kinds of databases, which internally, had different compliance levels. Retention management is very different now. If the policy is live and the claim has been completed, I couldn't archive the claim. I needed to keep a reference integrity of that claim and understand which policy paid out the claim. With this solution, the policy came in six months ago and qualified for archiving. The claim had been paid and in every environment, the claim had been closed, including the reporting system, the claims system, etc. With the payment set gateway, I can just go and archive. But, we had a hard time during this process. I rate the overall solution a seven out of ten.

Quotes from Members

We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
 

Pros

"It is quite useful and powerful."
"The performance optimization is quite good in DataStage. It provides parallelism and pipelining mechanisms"
"When we have needed help from the IBM team, they were helpful. Our company is a premium partner so we get fast responses."
"DataStage works better with Linux operating systems when the application services are hosted on Linux system equipment, but it's powerful on Windows too."
"The most valuable feature is the ability to transfer information via notes."
"The ETL tools are probably the most valuable feature. It has an IBM tool, a friendly UI and it makes things more comfortable."
"As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."
"It's a robust solution."
"Its connection to on-premise products is the most valuable. We mostly use the on-premise connection, which is seamless. This is what we prefer in this solution over other solutions. We are using it the most for the orchestration where the data is coming from different categories. Its other features are very much similar to what they are giving us in open source. Their push-down approach is the most advantageous, where they push most of the processing on to the same data source. This means that they have a serverless kind of thing, and they don't process the data inside a product such as Data Hub. They process the data from where the data is coming out. If it is coming from HANA, to capture the data or process it for analytics, orchestration, or management, they go to the HANA database and give it out. They don't process it on Data Hub. This push-down approach increases the processing speed a little bit because the data is processed where it is sitting. That's the best part and an advantage. I have used another product where they used to capture the data first and then they used to process it and give it. In Data Hub, it is in reverse. They process it first and give it, and then they put their own manipulations. They lead in terms of business functions. No other solution has business functions already implemented to perform business analysis. They have a lot of prebuilt business functions for machine learning and orchestration, which we can use directly to get an analysis out from the existing data. Most of the data is sitting as enterprise data there. That's a major advantage that they have."
"The most valuable feature is the S/4HANA 1909 On-Premise"
"SAP is one of the most seamless ERPs that have integrated SAP archiving within Excel. I have not seen this with any other database."
 

Cons

"I'd like to be able to do more with the data and metadata, including copy and pasting, et cetera."
"The template mapping could be easier."
"Improvements for DataStage could include better integration with modern data sources like cloud solutions and documents, along with enhancing its capability to handle non-structured data."
"We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great."
"DataStage is quite expensive. It is too hard to find a consultant using DataStage in Turkey."
"The error messaging needs to be improved."
"The response time from support is slow and needs to be improved."
"Reduced cost would allow more customers to choose the product. It's quite expensive in relation to the cost of other similar solutions."
"Nowadays there are some inconsistencies in data bases, however, they upgrade and release the versions to market."
"The company has everything offshore."
"In 2018, connecting it to outside sources, such as IoT products or IoT-enabled big data Hadoop, was a little complex. It was not smooth at the beginning. It was unstable. It took a lot of time for the initial data load. Sometimes, the connection broke, and we had to restart the process, which was a major issue, but they might have improved it now. It is very smooth with SAP HANA on-premise system, SAP Cloud Platform, and SAP Analytics Cloud. It could be because these are their own products, and they know how to integrate them. With Hadoop, they might have used open-source technologies, and that's why it was breaking at that time. They are providing less embedded integration because they want us to use their other products. For example, they don't want to go and remove SAP Analytics Cloud and put everything in Data Hub. They want us to use SAP Analytics Cloud somewhere else and not inside the Data Hub. On the integration part, it lacks real-time analytics, and it is slow. They should embed the SAP Analytics Cloud inside Data Hub or support some kind of analysis. They do provide some analysis, but it is not extensive. They are moreover open source. So, we need a lot of developers or data scientists to go in and implement Python algorithms. It would be better if they can provide their own existing algorithms and give some connections and drop-down menus to go and just configure those. It will make things really quick by increasing the embedded integrations. It will also improve the process efficiency and processing power. Its performance needs improvement. It is a little slow. It is not the best in the market, and there are other products that are much better than this. In terms of technology and performance, it is a little slow as compared to Microsoft and other data orchestration products. I haven't used other products, but I have read about those products, their settings, and the milliseconds that they do. In Azure Purview, they say that they can copy, manage, or transform the data within milliseconds. They say that they can transform 100 gigabytes of data within three to five seconds, which is something SAP cannot do. It generally takes a lot of time to process that much amount of data. However, I have never tested out Azure."
 

Pricing and Cost Advice

"High-cost of ownership: They could take a page from open source software."
"It's quite expensive."
"I have no information on the exact pricing for IBM InfoSphere DataStage because the solution is usually procured by the clients my company works with, though the pricing is higher compared to other solutions, so many clients choose to go with a different solution rather than IBM InfoSphere DataStage."
"The price is expensive but there are no licensing fees."
"Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
"Pricing varies based on use, and it is not as costly as some competing enterprise solutions."
"It's very expensive."
"The solution is cheap."
"The Cloud is very expensive, but SAP HANA previous service is okay."
report
Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
787,226 professionals have used our research since 2012.
 

Top Industries

By visitors reading reviews
Financial Services Firm
26%
Manufacturing Company
11%
Computer Software Company
10%
Insurance Company
8%
Computer Software Company
15%
Manufacturing Company
13%
Financial Services Firm
12%
Government
8%
 

Company Size

By reviewers
Large Enterprise
Midsize Enterprise
Small Business
No data available
 

Questions from the Community

Would you upgrade to more premium versions of IBM InfoSphere DataStage?
My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For ...
Is IBM InfoSphere DataStage more difficult to use compared to other tools in the field?
I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work ...
Do you rely on IBM Cloud Paks for your data? Have you utilized this product, or do you use IBM InfoSphere DataStage without it?
IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands...
What do you like most about SAP Data Hub?
SAP is one of the most seamless ERPs that have integrated SAP archiving within Excel. I have not seen this with any other database.
What needs improvement with SAP Data Hub?
We moved from Oracle. If you're aware of your monitoring system, the RPU market, and the managed system, you should move to HANA, which is an innovative database built by SAP itself. However, this ...
What is your primary use case for SAP Data Hub?
I technically handle the database, like cycle management projects. When transaction data comes in, we see it based on the retention periods. We have to move the data to some secure storage rather t...
 

Learn More

 

Overview

 

Sample Customers

Dubai Statistics Center, Etisalat Egypt
Kaeser Kompressoren, HARTMANN
Find out what your peers are saying about Microsoft, Informatica, Oracle and others in Data Integration. Updated: May 2024.
787,226 professionals have used our research since 2012.