Confluent vs IBM InfoSphere DataStage comparison

Cancel
You must select at least 2 products to compare!
Confluent Logo
10,171 views|7,826 comparisons
100% willing to recommend
IBM Logo
11,157 views|9,214 comparisons
82% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Confluent and IBM InfoSphere DataStage based on real PeerSpot user reviews.

Find out in this report how the two Streaming Analytics solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Confluent vs. IBM InfoSphere DataStage Report (Updated: July 2023).
768,886 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"Kafka Connect framework is valuable for connecting to the various source systems where code doesn't need to be written.""With Confluent Cloud we no longer need to handle the infrastructure and the plumbing, which is a concern for Confluent. The other advantage is that all portfolios have access to the data that is being shared.""Their tech support is amazing; they are very good, both on and off-site.""Implementing Confluent's schema registry has significantly enhanced our organization's data quality assurance.""The documentation process is fast with the tool.""I find Confluent's Kafka Connectors and Kafka Streams invaluable for my use cases because they simplify real-time data processing and ETL tasks by providing reliable, pre-packaged connectors and tools.""The most valuable is its capability to enhance the documentation process, particularly when creating software documentation.""The solution can handle a high volume of data because it works and scales well."

More Confluent Pros →

"The ETL tools are probably the most valuable feature. It has an IBM tool, a friendly UI and it makes things more comfortable.""It's a robust solution.""ETL is the most valuable feature.""The most valuable feature is the data integration for data warehousing.""The most valuable feature is the product's versatility to inject data.""The most valuable feature for our data processing needs is IBM InfoSphere DataStage's capability to handle ETL tasks with large record volumes.""It is quite useful and powerful.""We are mostly using transmission rules. It has a lot of functions and logic related to transmission. It is a user-friendly tool with in-built functions."

More IBM InfoSphere DataStage Pros →

Cons
"Currently, in the early stages, I see a gap on the security side. If you are using the SaaS version, we would like to get a fuller, more secure solution that can be adopted right out of the box. Confluence could do a better job sharing best practices or a reusable pattern that others have used, especially for companies that can not afford to hire professional services from Confluent.""The Schema Registry service could be improved. I would like a bigger knowledge base of other use cases and more technical forums. It would be good to have more flexible monitoring features added to the next release as well.""there is room for improvement in the visualization.""It could have more themes. They should also have more reporting-oriented plugins as well. It would be great to have free custom reports that can be dispatched directly from Jira.""The formatting aspect within the page can be improved and more powerful.""Confluence could improve the server version of the solution. However, most companies are going to the cloud.""Confluent's price needs improvement.""It could have more integration with different platforms."

More Confluent Cons →

"In terms of intermediate storage, we have some challenges, especially with customers who store data in intermediate locations.""The setup is extremely difficult.""We would be happy to see in next versions the ability to return several parameters from jobs. Now, jobs can return just one parameter. If they could return several parameters, that would be great.""It would be great if they can include some basic version of data quality checking features.""Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere.""In the future, I would like to see more integration with cloud technologies.""The troubleshooting guide is very bad.""It doesn't have any big data connections. It would be good to have them because most of the systems are moving towards big data. There should also be a user-friendly way to interact with the cloud. Its loading process is very slow. It takes a lot of time for around 5 or 6 million records, and we are not able to provide real-time data to the vendors due to this delay. Its performance needs to be improved. It is also like a legacy system. It is not updated much. In higher versions, they only do small changes. We would like to have new features and new technologies."

More IBM InfoSphere DataStage Cons →

Pricing and Cost Advice
  • "Confluent is expensive, I would prefer, Apache Kafka over Confluent because of the high cost of maintenance."
  • "You have to pay additional for one or two features."
  • "The pricing model of Confluent could improve because if you have a classic use case where you're going to use all the features there is no plan to reduce the features. You should be able to pick and choose basic services at a reduced price. The pricing was high for our needs. We should not have to pay for features we do not use."
  • "On a scale from one to ten, where one is low pricing and ten is high pricing, I would rate Confluent's pricing at five. I have not encountered any additional costs."
  • "Confluence's pricing is quite reasonable, with a cost of around $10 per user that decreases as the number of users increases. Additionally, it's worth noting that for teams of up to 10 users, the solution is completely free."
  • "Confluent has a yearly license, which is a bit high because it's on a per-user basis."
  • "It comes with a high cost."
  • "Confluent is highly priced."
  • More Confluent Pricing and Cost Advice →

  • "High-cost of ownership: They could take a page from open source software."
  • "Pricing varies based on use, and it is not as costly as some competing enterprise solutions."
  • "Small and medium-sized companies cannot afford to pay for this solution."
  • "The cost is too high."
  • "It's very expensive."
  • "Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
  • "The price is expensive but there are no licensing fees."
  • "It is quite expensive."
  • More IBM InfoSphere DataStage Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Streaming Analytics solutions are best for your needs.
    768,886 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:I find Confluent's Kafka Connectors and Kafka Streams invaluable for my use cases because they simplify real-time data processing and ETL tasks by providing reliable, pre-packaged connectors and… more »
    Top Answer:I would rate the pricing of Confluent as average, around a five out of ten. Additional costs could include features like multi-tenancy support and native encryption with custom algorithms, which would… more »
    Top Answer:Areas for improvement include implementing multi-storage support to differentiate between database stores based on data age and optimizing storage costs, as well as enhancing the offset management… more »
    Top Answer: My company currently uses the free version of the product, and we are definitely switching to a paid one. We needed a tool that can help us not only integrate our data but use it effectively. For the… more »
    Top Answer: I think the tool may cause some difficulties if you have not used other data integration solutions before. I have worked at companies that used different tools for data integration, and they work… more »
    Top Answer:IBM Cloud Paks makes a big difference in your data integration. My company has been using it alongside IBM InfoSphere DataStage and while the main product is good on its own, this one truly expands… more »
    Ranking
    3rd
    out of 38 in Streaming Analytics
    Views
    10,171
    Comparisons
    7,826
    Reviews
    11
    Average Words per Review
    413
    Rating
    8.5
    7th
    out of 100 in Data Integration
    Views
    11,157
    Comparisons
    9,214
    Reviews
    15
    Average Words per Review
    452
    Rating
    7.9
    Comparisons
    Learn More
    Overview

    Confluent is an enterprise-ready, full-scale streaming platform that enhances Apache Kafka. 

    Confluent has integrated cutting-edge features that are designed to enhance these tasks: 

    • Speed up application development and connectivity
    • Enable transformations through stream processing
    • Streamline business operations at scale
    • Adhere to strict architectural standards

    Confluent is a more complete distribution of Kafka in that it enhances the integration possibilities of Kafka by introducing tools for managing and optimizing Kafka clusters while providing methods for making sure the streams are secure. Confluent supports publish-and-subscribe as well as the storing and processing of data within the streams. Kafka is easier to operate and build thanks to Confluent.

    Confluent's software is available in three different varieties: 

    1. A free, open-source streaming platform that makes it simple to start using real-time data streams
    2. An enterprise-grade version of the product with more administrative, ops, and monitoring tools
    3. A premium cloud-based version.

    Confluent Advantage Features

    Confluent has many valuable key features. Some of the most useful ones include:

    • Multi-language

      • Clients: C++, Python, Go, and .NET
      • REST proxy: Can connect to Kafka from any connected network device
      • Admin REST APIs: RESTful interface for performing administrator operations
    • Pre-built ecosystem

      • Connectors: More than 100 supported connectors, including S3, Elastic, HDFS, JDBC
      • MQTT proxy: Gain access to Kafka from MQTT gateways and devices
      • Schema registry: Centralized database to guarantee data compatibility
    • Streaming database

      • ksqlDB: Materialized views and real-time stream processing
    • GUI management 

      • Control panel: GUI for scalable Kafka management and monitoring
      • Health+: Smart alerts and cloud-based control centers
    • DevOps automation that is flexible

      • Confluent for Kubernetes: Complete API to deploy on Kubernetes
      • Automated Ansible deployment on non-containerized environments
    • Dynamic performance 

      • Self-balancing clusters: Automated partition re-balancing across brokers in the cluster
      • Tiered storage: Older Kafka data offloading to object storage with transparent access
    • Security that is enterprise-grade 

      • Role-based access control: Granular user/group access authorization
      • Audit logs that are structured: Logs of user actions kept in dedicated Kafka topics
      • Secret protection: Sensitive information is encrypted
    • Global resilience

      • Linking clusters: A real-time, highly reliable, and consistent bridge across on-premises and cloud environments
      • Multiple-region clusters: Single Kafka cluster with automated client failover distributed across multiple data centers
      • Replicator: Asynchronous replication that is based on the Kafka Connect framework
    • Support

      • Round the clock enterprise support from Kafka experts

    Reviews from Real Users

    Confluent stands out among its competitors for a number of reasons. Two major ones are its robust enterprise support and its open source option. PeerSpot users take note of the advantages of these features in their reviews: 

    Ravi B., a solutions architect at a tech services company, writes of the solution, “KSQL is a valuable feature, as is the Kafka Connect framework for connecting to the various source systems where you need not write the code. We get great support from Confluent because we're using the enterprise version and whenever there's a problem, they support us with fine-tuning and finding the root cause.”

    Amit S., an IT consultant, notes, “The biggest benefit is that it is open source. You have the flexibility of opting or not opting for enterprise support, even though the tool itself is open source.” He adds, “The second benefit is it's very modern and built on Java and Scala. You can extend the features very well, and it doesn't take a lot of effort to do so.”

    IBM InfoSphere DataStage is a high-quality data integration tool that aims to design, develop, and run jobs that move and transform data for organizations of different sizes. The product works by integrating data across multiple systems through a high-performance parallel framework. It supports extended metadata management, enterprise connectivity, and integration of all types of data.

    The solution is the data integration component of IBM InfoSphere Information Server, providing a graphical framework for moving data from source systems to target systems. IBM InfoSphere DataStage can deliver data to data warehouses, data marts, operational data sources, and other enterprise applications. The tool works with various types of patterns - extract, transform and load (ETL), and extract, load, and transform (ELT). The scalability of the platform is achieved by using parallel processing and enterprise connectivity.

    The solution has various versions, catering to different types of companies, which include the Server Edition, the Enterprise Edition, and the MVS Edition. Depending on which version a company has bought, different goals can be achieved. They include the following:

    • Designing data flows to extract information from multiple sources, transform the data, and deliver it to target databases or applications.

    • Delivery of relevant and accurate data through direct connections to enterprise applications.

    • Reduction of development time and improvement of consistency through prebuilt functions.

    • Utilization of InfoSphere Information Server tools for accelerating the project delivery cycle.

    IBM InfoSphere DataStage can be deployed in various ways, including:

    • As a service: The tool can be accessed from a subscription model, where its capabilities are a part of IBM DataStage on IBM Cloud Park for Data as a Service. This option offers full management on IBM Cloud.

    • On premises or in any cloud: The two editions - IBM DataStage Enterprise and IBM DataStage Enterprise Plus - can run workloads on premises or in any cloud when added to IBM DataStage on IBM Cloud Pak for Data as a Service.

    • On premises: The basic jobs of the tool can be run on premises using IBM DataStage.

    IBM InfoSphere DataStage Features

    The tool has various features through which users can integrate and utilize their data effectively. The components of IBM InfoSphere DataStage include:

    • AI services: The tool offers services such as data science, event messaging, data warehousing, and data virtualization. It accelerates processes through artificial intelligence (AI) and offers a connection with IBM Cloud Paks - the cloud-native insight platform of the solution.

    • Parallel engine: Through this feature, ETL performance can be optimized to process data at scale. This is achieved through parallel engine and load balancing, which maximizes throughput.

    • Metadata support: This feature of the product uses the IBM Watson Knowledge Catalog to protect companies' sensitive data and monitor who can access it and at what levels.

    • Automated delivery pipelines: IBM InfoSphere DataStage reduces costs by automating continuous integration and delivery of pipelines.

    • Prebuilt connectors: The feature for prebuilt connectivity and stages allows users to move data between multiple cloud sources and data warehouses, including IBM native products.

    • IBM DataStage Flow Designer: This feature offers assistance through machine learning design. The product offers its clients a user-friendly interface which facilitates the work process.

    • IBM InfoSphere QualityStage: The tool provides a feature that automatically resolves data quality issues and increases the reliability of the delivered data.

    • Automated failure detection: Through this feature, companies can reduce infrastructure management efforts, relying on the automated detection that the tool offers.

    • Distributed data processing: Cloud runtimes can be executed remotely through this feature while maintaining its sovereignty and decreasing costs.

    IBM InfoSphere DataStage Benefits

    This solution offers many benefits for the companies that utilize it for data integration. Some of these benefits include:

    • Increased speed of workload execution due to better balancing and a parallel engine.

    • Reduction of data movement costs through integrations and seamless design of jobs.

    • Modernization of data integration by extending the capabilities of companies' data.

    • Delivery of reliable data through IBM Cloud Pak for Data.

    • Utilization of a drag-and-drop interface which assists in the delivery of data without the need for code.

    • Effective data manipulation allows data to be merged before being mapped and transformed.

    • Creating easier access of users to their data by providing visual maps of the process and the delivered data.

    Reviews from Real Users

    A data/solution architect at a computer software company says the product is robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data.

    Tirthankar Roy Chowdhury, team leader at Tata Consultancy Services, feels the tool is user-friendly with a lot of functionalities, and doesn't require much coding because of its drag-and-drop features.

    Sample Customers
    ING, Priceline.com, Nordea, Target, RBC, Tivo, Capital One, Chartboost
    Dubai Statistics Center, Etisalat Egypt
    Top Industries
    REVIEWERS
    Computer Software Company31%
    Retailer15%
    Non Tech Company8%
    Government8%
    VISITORS READING REVIEWS
    Financial Services Firm19%
    Computer Software Company17%
    Manufacturing Company8%
    Retailer6%
    REVIEWERS
    Computer Software Company50%
    Insurance Company14%
    Transportation Company7%
    Healthcare Company7%
    VISITORS READING REVIEWS
    Financial Services Firm26%
    Manufacturing Company11%
    Computer Software Company10%
    Insurance Company7%
    Company Size
    REVIEWERS
    Small Business26%
    Midsize Enterprise21%
    Large Enterprise53%
    VISITORS READING REVIEWS
    Small Business19%
    Midsize Enterprise12%
    Large Enterprise69%
    REVIEWERS
    Small Business45%
    Midsize Enterprise6%
    Large Enterprise49%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise9%
    Large Enterprise75%
    Buyer's Guide
    Confluent vs. IBM InfoSphere DataStage
    July 2023
    Find out what your peers are saying about Confluent vs. IBM InfoSphere DataStage and other solutions. Updated: July 2023.
    768,886 professionals have used our research since 2012.

    Confluent is ranked 3rd in Streaming Analytics with 19 reviews while IBM InfoSphere DataStage is ranked 7th in Data Integration with 37 reviews. Confluent is rated 8.4, while IBM InfoSphere DataStage is rated 7.8. The top reviewer of Confluent writes "Has good technical support services and a valuable feature for real-time data streaming ". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "User-friendly with a lot of functions for transmission rules, but has slow performance and not suitable for a huge volume of data". Confluent is most compared with Amazon MSK, Amazon Kinesis, Databricks, AWS Glue and Oracle GoldenGate, whereas IBM InfoSphere DataStage is most compared with SSIS, IBM Cloud Pak for Data, Azure Data Factory, Talend Open Studio and Informatica PowerCenter. See our Confluent vs. IBM InfoSphere DataStage report.

    We monitor all Streaming Analytics reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.