IBM Cloud Pak for Data vs IBM InfoSphere DataStage comparison

Cancel
You must select at least 2 products to compare!
IBM Logo
4,032 views|2,639 comparisons
84% willing to recommend
IBM Logo
10,952 views|9,105 comparisons
82% willing to recommend
Comparison Buyer's Guide
Executive Summary
Updated on Mar 6, 2024

We compared IBM InfoSphere DataStage and IBM Cloud Pak for Data based on our user's reviews in several parameters.

IBM InfoSphere DataStage is praised for its strong data integration, connectors, workflow management, ETL functionalities, and data quality controls. In contrast, IBM Cloud Pak for Data is commended for its analytics capabilities, user interface, data management tools, integration, scalability, governance, security, collaboration, and AI-driven features. Feedback on customer service, setup duration, pricing, and ROI varies between the two products.

Features: IBM InfoSphere DataStage is praised for its strong data integration capabilities, comprehensive set of connectors, efficient workflow management, and robust ETL functionalities. On the other hand, IBM Cloud Pak for Data is valued for its robust analytics capabilities, ease of use, comprehensive data management tools, seamless integration, and advanced data governance and security features. It also offers AI-driven capabilities like machine learning and predictive analytics.

Pricing and ROI: The available data does not provide any information about the setup cost for IBM InfoSphere DataStage. Similarly, the pricing and licensing information for IBM Cloud Pak for Data is not provided in the available data source., IBM InfoSphere DataStage has no available data to determine its ROI, while there is also no information or insights about the ROI of IBM Cloud Pak for Data.

Room for Improvement: IBM InfoSphere DataStage does not have specific areas for improvement identified in the available responses. Similarly, there is no specific feedback or review available for IBM Cloud Pak for Data on what needs improvement.

Deployment and customer support: Based on the available summaries, it is not possible to compare the user reviews regarding the duration to establish IBM InfoSphere DataStage and IBM Cloud Pak for Data as the feedback related to these aspects is not provided for both products., Based on the available data, there is not enough information to provide a summary of the customer service and support of IBM InfoSphere DataStage. The customer service and support of IBM Cloud Pak for Data received a lack of feedback from the reviews provided.

The summary above is based on 24 interviews we conducted recently with IBM InfoSphere DataStage and IBM Cloud Pak for Data users. To access the review's full transcripts, download our report.

To learn more, read our detailed IBM Cloud Pak for Data vs. IBM InfoSphere DataStage Report (Updated: March 2024).
769,599 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"One of Cloud Pak's best features is the Watson Knowledge Catalog, which helps you implement data governance.""The most valuable feature of IBM Cloud Pak for Data is the Modeler flows. The ability to develop models using a graphical approach and the capability to connect to various sources, as well as the data virtualization capabilities, allow me to easily access and utilize data that is dispersed across different sources.""You can model the data there, connect the data models with the business processes and create data lineage processes.""Cloud Pak's most valuable features are IBM MQ, IBM App Connect, IBM API Connect, and ISPF.""What I found most helpful in IBM Cloud Pak for Data is containerization, which means it's easy to shift and leave in terms of moving to other clouds. That's an advantage of IBM Cloud Pak for Data.""The most valuable features are data virtualization and reporting.""Its data preparation capabilities are highly valuable.""Scalability-wise, I rate the solution a nine or ten out of ten."

More IBM Cloud Pak for Data Pros →

"I am impressed with the tool's ETL tracing.""Highly customizable: Allowing you to handle multiple data latencies (scheduled batch, on-demand, and real-time) in the same job.""IBM is stable and accurate to monitor. It's easy to understand to monitor the data lineage from source to target.""The best feature of IBM InfoSphere DataStage for me was that it was very much user-friendly. The solution didn't require that much raw coding because most of its features were drag and drop, plus it had a large number of functionalities.""Compared to other ETL tools, DataStage has excellent debugging and development capabilities. And the availability of connectors, even though we sometimes have to opt for specific ones. Also, the availability of patches is good.""The product is a stable and powerful data management solution that can run in parallel mode for enhanced speed.""As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables.""The solution's scalability is really good...we are using multi-instance jobs where you can scale them easily."

More IBM InfoSphere DataStage Pros →

Cons
"The interface could improve because sometimes it becomes slow. Sometimes there is a delay between clicks when using the software, which can make the development process slow. It can take a few seconds to complete one action, and then a few more seconds to do the next one.""One thing that bugs me is how much infrastructure Cloud Pak requires for the initial deployment. It doesn't allow you to start small. The smallest permitted deployment is too big. It's a huge problem that prevents us from implementing the solution in many scenarios.""The product must improve its performance.""There is a solution that is part of IBM Cloud Pak for Data called Watson OpenScale. It is used to monitor the deployed models for the quality and fairness of the results. This is one area that needs a lot of improvement.""The solution could have more connectors.""The tool depends on the control plane, an OpenShift container platform utilized as an orchestration layer...So, we have communicated this issue to IBM and asked if it is feasible to adapt the solution to work on a Kubernetes platform that we support.""Cloud Pak would be improved with integration with cloud service providers like Cloudera.""One challenge I'm facing with IBM Cloud Pak for Data is native features have been decommissioned, such as XML input and output. Too many changes have been made, and my company has around one hundred thousand mappings, so my team has been putting more effort into alternative ways to do things. Another area for improvement in IBM Cloud Pak for Data is that it's more complicated to shift from on-premise to the cloud. Other vendors provide secure agents that easily connect with your existing setup. Still, with IBM Cloud Pak for Data, you have to perform connection migration steps, upgrade to the latest version, etc., which makes it more complicated, especially as my company has XML-based mappings. Still, the XML input and output capabilities of IBM Cloud Pak for Data have been discontinued, so I'd like IBM to bring that back."

More IBM Cloud Pak for Data Cons →

"The response time from support is slow and needs to be improved.""The troubleshooting guide is very bad.""The documentation and in-application help for this solution need to be improved, especially for new features.""The setup is extremely difficult.""Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere.""The pricing should be lower.""I really like this tool, but the administration should be on the same client application because a lot of administration features are not on the client-side, and they usually need to have administrative access. It's quite complicated to force IT teams to have separate administrative access from the developers.""In the future, I would like to see more integration with cloud technologies."

More IBM InfoSphere DataStage Cons →

Pricing and Cost Advice
  • "I think that this product is too expensive for smaller companies."
  • "I don't have the exact licensing cost for IBM Cloud Pak for Data, as my company is still finalizing requirements, including monthly, yearly, and three-year licensing fees. Still, on a scale of one to five, I'd rate it a three because, compared to other vendors, it's more complicated."
  • "Cloud Pak's cost is a little high."
  • "IBM Cloud Pak for Data is expensive. If we include the training time and the machine learning, it's expensive. The cost of the execution is more reasonable."
  • "For the licensing of the solution, there is a yearly payment that needs to be made. Also, since it is expensive, cost-wise, I rate the solution an eight or nine out of ten."
  • "It's quite expensive."
  • "The solution is expensive."
  • More IBM Cloud Pak for Data Pricing and Cost Advice →

  • "High-cost of ownership: They could take a page from open source software."
  • "Pricing varies based on use, and it is not as costly as some competing enterprise solutions."
  • "Small and medium-sized companies cannot afford to pay for this solution."
  • "The cost is too high."
  • "It's very expensive."
  • "Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
  • "The price is expensive but there are no licensing fees."
  • "It is quite expensive."
  • More IBM InfoSphere DataStage Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Data Integration solutions are best for your needs.
    769,599 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:DataStage allows me to connect to different data sources.
    Top Answer:The product must improve its performance. We see typical cloud-related issues in the solution. IBM can still focus more on keeping the performance up and keeping it 100% available all the time.
    Top Answer: My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For the… more »
    Top Answer: I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work… more »
    Top Answer:IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands… more »
    Ranking
    17th
    out of 101 in Data Integration
    Views
    4,032
    Comparisons
    2,639
    Reviews
    9
    Average Words per Review
    500
    Rating
    8.4
    7th
    out of 101 in Data Integration
    Views
    10,952
    Comparisons
    9,105
    Reviews
    16
    Average Words per Review
    467
    Rating
    7.9
    Comparisons
    Also Known As
    Cloud Pak for Data
    Learn More
    Overview

    IBM Cloud Pak® for Data is a fully-integrated data and AI platform that modernizes how businesses collect, organize and analyze data to infuse AI throughout their organizations. Cloud-native by design, the platform unifies market-leading services spanning the entire analytics lifecycle. From data management, DataOps, governance, business analytics and automated AI, IBM Cloud Pak for Data helps eliminate the need for costly, and often competing, point solutions while providing the information architecture you need to implement AI successfully.

    Building on the streamlined hybrid-cloud foundation of Red Hat® OpenShift®, IBM Cloud Pak for Data takes advantage of the underlying resource and infrastructure optimization and management. The solution fully supports multicloud environments such as Amazon Web Services (AWS), Azure, Google Cloud, IBM Cloud™ and private cloud deployments. Find out how IBM Cloud Pak for Data can lower your total cost of ownership and accelerate innovation.

    IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.

    The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.

    The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:

    • Designing data flows to extract information from multiple sources, transform the data, and deliver it to target databases or applications.

    • Delivery of relevant and accurate data through direct connections to enterprise applications.

    • Reduction of development time and improvement of consistency through prebuilt functions.

    • Utilization of InfoSphere Information Server tools for accelerating the project delivery cycle.

    IBM InfoSphere DataStage can be deployed in various ways, including:

    • As a service: The tool can be accessed from a subscription model, where its capabilities are a part of IBM DataStage on IBM Cloud Park for Data as a Service. This option offers full management on IBM Cloud.

    • On premises or in any cloud: The two editions - IBM DataStage Enterprise and IBM DataStage Enterprise Plus - can run workloads on premises or in any cloud when added to IBM DataStage on IBM Cloud Pak for Data as a Service.

    • On premises: The basic jobs of the tool can be run on premises using IBM DataStage.

    IBM InfoSphere DataStage Features

    The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:

    • AI services: The tool offers services such as data science, event messaging, data warehousing, and data virtualization. It accelerates processes through artificial intelligence (AI) and offers a connection with IBM Cloud Paks - the cloud-native insight platform of the solution.

    • Parallel engine: Through this feature, ETL performance can be optimized to process data at scale. This is achieved through parallel engine and load balancing, which maximizes throughput.

    • Metadata support: This feature of the product uses the IBM Watson Knowledge Catalog to protect companies' sensitive data and monitor who can access it and at what levels.

    • Automated delivery pipelines: IBM InfoSphere DataStage reduces costs by automating continuous integration and delivery of pipelines.

    • Prebuilt connectors: The feature for prebuilt connectivity and stages allows users to move data between multiple cloud sources and data warehouses, including IBM native products.

    • IBM DataStage Flow Designer: This feature offers assistance through machine learning design. The product offers its clients a user-friendly interface which facilitates the work process.

    • IBM InfoSphere QualityStage: The tool provides a feature that automatically resolves data quality issues and increases the reliability of the delivered data.

    • Automated failure detection: Through this feature, companies can reduce infrastructure management efforts, relying on the automated detection that the tool offers.

    • Distributed data processing: Cloud runtimes can be executed remotely through this feature while maintaining its sovereignty and decreasing costs.

    IBM InfoSphere DataStage Benefits

    This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:

    • Increased speed of workload execution due to better balancing and a parallel engine.

    • Reduction of data movement costs through integrations and seamless design of jobs.

    • Modernization of data integration by extending the capabilities of companies' data.

    • Delivery of reliable data through IBM Cloud Pak for Data.

    • Utilization of a drag-and-drop interface which assists in the delivery of data without the need for code.

    • Effective data manipulation allows data to be merged before being mapped and transformed.

    • Creating easier access of users to their data by providing visual maps of the process and the delivered data.

    Reviews from Real Users

    A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.

    Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.

    Sample Customers
    Qatar Development Bank, GuideWell, Skanderborg Music Festival
    Dubai Statistics Center, Etisalat Egypt
    Top Industries
    VISITORS READING REVIEWS
    Financial Services Firm26%
    Computer Software Company10%
    Manufacturing Company8%
    Government8%
    REVIEWERS
    Computer Software Company50%
    Insurance Company14%
    Transportation Company7%
    Healthcare Company7%
    VISITORS READING REVIEWS
    Financial Services Firm26%
    Manufacturing Company11%
    Computer Software Company10%
    Insurance Company8%
    Company Size
    REVIEWERS
    Small Business46%
    Large Enterprise54%
    VISITORS READING REVIEWS
    Small Business17%
    Midsize Enterprise7%
    Large Enterprise76%
    REVIEWERS
    Small Business45%
    Midsize Enterprise6%
    Large Enterprise49%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise9%
    Large Enterprise74%
    Buyer's Guide
    IBM Cloud Pak for Data vs. IBM InfoSphere DataStage
    March 2024
    Find out what your peers are saying about IBM Cloud Pak for Data vs. IBM InfoSphere DataStage and other solutions. Updated: March 2024.
    769,599 professionals have used our research since 2012.

    IBM Cloud Pak for Data is ranked 17th in Data Integration with 11 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. IBM Cloud Pak for Data is rated 8.0, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of IBM Cloud Pak for Data writes "A scalable data analytics and digital transformation tool that provides useful features and integrations". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". IBM Cloud Pak for Data is most compared with Azure Data Factory, Informatica Cloud Data Integration, Palantir Foundry, Denodo and IBM InfoSphere Information Server, whereas IBM InfoSphere DataStage is most compared with SSIS, Azure Data Factory, Talend Open Studio, Informatica PowerCenter and IBM InfoSphere Information Server. See our IBM Cloud Pak for Data vs. IBM InfoSphere DataStage report.

    See our list of best Data Integration vendors.

    We monitor all Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.