AWS Glue vs StreamSets comparison

Cancel
You must select at least 2 products to compare!
Amazon Web Services (AWS) Logo
12,012 views|8,420 comparisons
92% willing to recommend
StreamSets Logo
4,226 views|2,398 comparisons
100% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between AWS Glue and StreamSets based on real PeerSpot user reviews.

Find out in this report how the two Cloud Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed AWS Glue vs. StreamSets Report (Updated: March 2024).
767,319 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The solution is serverless so it allows us to transform data while optimizing the cost and performance of Spark jobs.""The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features.""The solution's technical support is good. Whenever we raise a use case where we face an issue in our company, we get a response from the solution's technical team.""Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process.""The most valuable feature of AWS Glue is its ease of use and good documentation. Additionally, we can do all the transformations that we need.""It is a stable and scalable solution.""One of the best features of the solution is its ability to easily integrate with other AWS services.""It's fairly straightforward as a product; it's not very complicated."

More AWS Glue Pros →

"The Ease of configuration for pipes is amazing. It has a lot of connectors. Mainly, we can do everything with the data in the pipe. I really like the graphical interface too""I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks.""For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems.""What I love the most is that StreamSets is very light. It's a containerized application. It's easy to use with Docker. If you are a large organization, it's very easy to use Kubernetes.""It is really easy to set up and the interface is easy to use.""It's very easy to integrate. It integrates with Snowflake, AWS, Google Cloud, and Azure. It's very helpful for DevOps, DataOps, and data engineering because it provides a comprehensive solution, and it's not complicated.""The best feature that I really like is the integration.""In StreamSets, everything is in one place."

More StreamSets Pros →

Cons
"The price of the solution could improve.""On occasion, the solution's dashboard reports that a project failed due to runtime but it actually succeeded.""The solution’s stability could be improved.""There should be more connectors for different databases.""The solution could be cheaper. The price of the solution is an area that needs improvement.""It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options.""The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3.""In terms of performance, if they can further optimize the execution time for serverless jobs, it would be a welcome improvement."

More AWS Glue Cons →

"One area for improvement could be the cloud storage server speed, as we have faced some latency issues here and there.""I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions.""We create pipelines or jobs in StreamSets Control Hub. It is a great feature, but if there is a way to have a folder structure or organize the pipelines and jobs in Control Hub, it would be great. I submitted a ticket for this some time back.""They need to improve their customer care services. Sometimes it has taken more than 48 hours to resolve an issue. That should be reduced. They are aware of small or generic issues, but not the more technical or deep issues. For those, they require some time, generally 48 to 72 hours to respond. That should be improved.""In terms of the product, I don't think there is any room for improvement because it is very good. One small area of improvement that is very much needed is on the knowledge base side. Sometimes, it is not very clear how to set up a certain process or a certain node for a person who's using the platform for the first time.""Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful.""I would like to see it integrate with other kinds of platforms, other than Java. We're going to have a lot of applications using .NET and other languages or frameworks. StreamSets is very helpful for the old Java platform but it's hard to integrate with the other platforms and frameworks.""There aren't enough hands-on labs, and debugging is also an issue because it takes a lot of time. Logs are not that clear when you are debugging, and you can only select a single source for a pipeline."

More StreamSets Cons →

Pricing and Cost Advice
  • "The pricing is a bit higher than other solutions like Athena and EC2. If the pricing becomes more scaled or flexible, it will be good because you have to pay 44 cents just for one DPU for an hour. If you increase DPUs to 5 or 10, the pricing gets multiplied. There are also some time limits like 0 to 10 minutes or 10 to 20 minutes. If the pricing is according to the minutes, it would be better because you have to limit your job to 10 minutes or 20 minutes."
  • "It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us."
  • "Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients. In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend."
  • "Technical support is a paid service, and which subscription you have is dependent on that. You must pay one of them, and it ranges from $15,000 to $25,000 per year."
  • "This solution is affordable and there is an option to pay for the solution based on your usage."
  • "AWS Glue is quite costly, especially for small organizations."
  • "AWS Glue uses a pay-as-you-go approach which is helpful. The price of the overall solution is low and is a great advantage."
  • "The overall cost of AWS Glue could be better. It cost approximately $1,000 a month. There is paid support available from AWS Glue."
  • More AWS Glue Pricing and Cost Advice →

  • "We are running the community version right now, which can be used free of charge."
  • "StreamSets Data Collector is open source. One can utilize the StreamSets Data Collector, but the Control Hub is the main repository where all the jobs are present. Everything happens in Control Hub."
  • "It has a CPU core-based licensing, which works for us and is quite good."
  • "There are different versions of the product. One is the corporate license version, and the other one is the open-source or free version. I have been using the corporate license version, but they have recently launched a new open-source version so that anybody can create an account and use it. The licensing cost varies from customer to customer. I don't have a lot of input on that. It is taken care of by PMO, and they seem fine with its pricing model. It is being used enterprise-wide. They seem to have got a good deal for StreamSets."
  • "The pricing is good, but not the best. They have some customized plans you can opt for."
  • "We use the free version. It's great for a public, free release. Our stance is that the paid support model is too expensive to get into. They should honestly reevaluate that."
  • "The overall cost for small and mid-size organizations needs to be better."
  • "There are two editions, Professional and Enterprise, and there is a free trial. We're using the Professional edition and it is competitively priced."
  • More StreamSets Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Cloud Data Integration solutions are best for your needs.
    767,319 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.
    Top Answer:We reviewed AWS Glue before choosing Talend Open Studio. AWS Glue is the managed ETL (extract, transform, and load) from Amazon Web Services. AWS Glue enables AWS users to create and manage jobs in… more »
    Top Answer:AWS Glue's main use case is for allowing users to discover, prepare, move, and integrate data from multiple sources. The product lets you use this data for analytics, application development, or… more »
    Top Answer:I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines… more »
    Top Answer:StreamSets should provide a mechanism to be able to perform data quality assessment when the data is being moved from one source to the target. So the ability to validate the data against various data… more »
    Top Answer:We are using StreamSets to migrate our on-premise data to the cloud.
    Ranking
    1st
    Views
    12,012
    Comparisons
    8,420
    Reviews
    32
    Average Words per Review
    419
    Rating
    7.8
    8th
    out of 100 in Data Integration
    Views
    4,226
    Comparisons
    2,398
    Reviews
    21
    Average Words per Review
    1,337
    Rating
    8.4
    Comparisons
    Learn More
    Overview

    AWS Glue is a serverless cloud data integration tool that facilitates the discovery, preparation, movement, and integration of data from multiple sources for machine learning (ML), analytics, and application development. The solution includes additional productivity and data ops tooling for running jobs, implementing business workflows, and authoring.

    AWS Glue allows users to connect to more than 70 diverse data sources and manage data in a centralized data catalog. The solution facilitates visual creation, running, and monitoring of extract, transform, and load (ETL) pipelines to load data into users' data lakes. This Amazon product seamlessly integrates with other native applications of the brand and allows users to search and query cataloged data using Amazon EMR, Amazon Athena, and Amazon Redshift Spectrum.

    The solution also utilizes application programming interface (API) operations to transform users' data, create runtime logs, store job logic, and create notifications for monitoring job runs. The console of AWS Glue connects all of these services into a managed application, facilitating the monitoring and operational processes. The solution also performs provisioning and management of the resources required to run users' workloads in order to minimize manual work time for organizations.

    AWS Glue Features

    AWS Glue groups its features into four categories - discover, prepare, integrate, and transform. Within those groups are the following features:

    • Automatic schema discovery: AWS Glue crawlers connect to the organization's source or target data source through a prioritized list of classifiers to determine the schema for users' data. This feature creates metadata in companies' AWS Glue Data Catalog.

    • Schemas for data stream management: The AWS Glue Schema Registry enables users to validate and control the evolution of streaming data through registered Apache Avro schemas for no additional charge.

    • Automatic scaling based on workload: This feature dynamically scales resources up and down based on workload. The feature controls job resources, removing them depending on how much the workload can be split up.

    • FindMatches: This feature is for machine learning-based data deduplication and cleansing, and works by finding records that are imperfect matches of each other to remove useless data copies.

    • Edit, debug, and test ETL code: This feature helps users who have chosen to interactively develop their ETL code by providing development endpoints for editing, debugging, and testing the code it generates for them.

    • AWS Glue DataBrew: An interactive, point-and-click visual interface for specialists to clean and normalize data without the need to write any code.

    • AWS Glue Interactive Sessions: This feature simplifies the development of data integration jobs by enabling data engineers to interactively prepare and explore data.

    • AWS Glue Studio Job Notebooks: This AWS Glue feature provides serverless notebooks with minimal setup, allowing developers to start working in a timely manner.

    • Complex ETL pipeline building: This feature allows the product to be invoked on a schedule, on demand, or based on an event, allowing users to start multiple jobs in parallel or specify dependencies to build complex ETL pipelines.

    • AWS Glue Studio: This AWS Glue feature allows users to visually transform data through a drag-and-drop interface. The product automatically generates the code for ETL processes for users' data.

    AWS Glue Benefits

    AWS Glue offers a wide range of benefits for its users. These benefits include:

    • Users of other AWS products can easily onboard with AWS Glue, as it is integrated across a wide range of the company's services.

    • The solution is serverless, which allows for a lower total cost of ownership.

    • AWS Glue offers more power for users, as it automates much of the effort in building, maintaining, and running ETL jobs.

    • The product allows customers to easily discover and search across all their AWS datasets through AWS Glue Data Catalog.

    • AWS Glue does not require additional payment for managing and enforcing schemas for data streams.

    • The solution facilitates the authority of scalable ETL jobs for beginners and non-coding experts through a drag-and-drop interface.

    Reviews from Real Users

    Mustapha A., a cloud data engineer at Jems Groupe, likes AWS Glue because it is a product that is great for serverless data transformations.

    Liana I., CEO at Quark Technologies SRL, describes AWS Glue as a highly scalable, reliable, and beneficial pay-as-you-go pricing model.

    StreamSets is a data integration platform that enables organizations to efficiently move and process data across various systems. It offers a user-friendly interface for designing, deploying, and managing data pipelines, allowing users to easily connect to various data sources and destinations. StreamSets also provides real-time monitoring and alerting capabilities, ensuring that data is flowing smoothly and any issues are quickly addressed.

    Sample Customers
    bp, Cerner, Expedia, Finra, HESS, intuit, Kellog's, Philips, TIME, workday
    Availity, BT Group, Humana, Deluxe, GSK, RingCentral, IBM, Shell, SamTrans, State of Ohio, TalentFulfilled, TechBridge
    Top Industries
    REVIEWERS
    Computer Software Company47%
    Financial Services Firm18%
    Pharma/Biotech Company12%
    Consumer Goods Company6%
    VISITORS READING REVIEWS
    Financial Services Firm20%
    Computer Software Company14%
    Manufacturing Company7%
    Insurance Company7%
    REVIEWERS
    Financial Services Firm20%
    Energy/Utilities Company20%
    Comms Service Provider13%
    Computer Software Company13%
    VISITORS READING REVIEWS
    Financial Services Firm17%
    Computer Software Company13%
    Manufacturing Company8%
    Government7%
    Company Size
    REVIEWERS
    Small Business29%
    Midsize Enterprise13%
    Large Enterprise58%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise12%
    Large Enterprise72%
    REVIEWERS
    Small Business40%
    Midsize Enterprise12%
    Large Enterprise48%
    VISITORS READING REVIEWS
    Small Business16%
    Midsize Enterprise11%
    Large Enterprise73%
    Buyer's Guide
    AWS Glue vs. StreamSets
    March 2024
    Find out what your peers are saying about AWS Glue vs. StreamSets and other solutions. Updated: March 2024.
    767,319 professionals have used our research since 2012.

    AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. AWS Glue is rated 7.8, while StreamSets is rated 8.4. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and Talend Open Studio, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and Confluent. See our AWS Glue vs. StreamSets report.

    See our list of best Cloud Data Integration vendors.

    We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.