We performed a comparison between AWS Glue and StreamSets based on real PeerSpot user reviews.
Find out in this report how the two Cloud Data Integration solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."AWS Glue's most valuable features are the data catalog, including crawlers and tables, and Glue Studio, which means you don't have to use custom code."
"The solution is highly user-friendly, and its features are easy to use. The new addition of AWS Glue Data Catalog is also very beneficial, making the tool even more helpful for its users."
"Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process."
"I like the fact that AWS Glue works with Python scripts."
"I like its integration and ability to handle all data-related tasks."
"The solution integrates well with other AWS products or services."
"The product has a valuable feature for data catalog."
"AWS Glue is a stable and easy-to-use solution."
"The best thing about StreamSets is its plugins, which are very useful and work well with almost every data source. It's also easy to use, especially if you're comfortable with SQL. You can customize it to do what you need. Many other tools have started to use features similar to those introduced by StreamSets, like automated workflows that are easy to set up."
"Important features include that it comprises lots of functionality to connect data from various sources through connector availability, scheduling pipelines at any time, and integration with third-party and security solutions for encryption."
"I really appreciate the numerous ready connectors available on both the source and target sides, the support for various media file formats, and the ease of configuring and managing pipelines centrally."
"The ability to have a good bifurcation rate and fewer mistakes is valuable."
"For me, the most valuable features in StreamSets have to be the Data Collector and Control Hub, but especially the Data Collector. That feature is very elegant and seamlessly works with numerous source systems."
"The best feature that I really like is the integration."
"I have used Data Collector, Transformer, and Control Hub products from StreamSets. What I really like about these products is that they're very user-friendly. People who are not from a technological or core development background find it easy to get started and build data pipelines and connect to the databases. They would be comfortable like any technical person within a couple of weeks."
"The most valuable would be the GUI platform that I saw. I first saw it at a special session that StreamSets provided towards the end of the summer. I saw the way you set it up and how you have different processes going on with your data. The design experience seemed to be pretty straightforward to me in terms of how you drag and drop these nodes and connect them with arrows."
"The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."
"The setup and installation is a bit complex without advanced knowledge or training."
"One area that could be improved is the ETL view. The drag-and-drop interface is not as user-friendly as some other ETL tools."
"The price of the solution could improve."
"The solution should offer features for streaming data in addition to batching data."
"The solution could be cheaper. The price of the solution is an area that needs improvement."
"The product has only a few built-in transformations."
"It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do. For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do. It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options."
"Sometimes, when we have large amounts of data that is very efficiently stored in Hadoop or Kafka, it is not very efficient to run it through StreamSets, due to the lack of efficiency or the resources that StreamSets is using."
"We've seen a couple of cases where it appears to have a memory leak or a similar problem."
"The execution engine could be improved. When I was at their session, they were using some obscure platform to run. There is a controller, which controls what happens on that, but you should be able to easily do this at any of the cloud services, such as Google Cloud. You shouldn't have any issues in terms of how to run it with their online development platform or design platform, basically their execution engine. There are issues with that."
"I would like to see further improvement in the UI. In addition, upgrades are not automatic and they should be automated. Currently, we have to manually upgrade versions."
"The software is very good overall. Areas for improvement are the error logging and the version history. I would like to see better, more detailed error logging information."
"Visualization and monitoring need to be improved and refined."
"Currently, we can only use the query to read data from SAP HANA. What we would like to see, as soon as possible, is the ability to read from multiple tables from SAP HANA. That would be a really good thing that we could use immediately. For example, if you have 100 tables in SQL Server or Oracle, then you could just point it to the schema or the 100 tables and ingestion information. However, you can't do that in SAP HANA since StreamSets currently is lacking in this. They do not have a multi-table feature for SAP HANA. Therefore, a multi-table origin for SAP HANA would be helpful."
"They need to improve their customer care services. Sometimes it has taken more than 48 hours to resolve an issue. That should be reduced. They are aware of small or generic issues, but not the more technical or deep issues. For those, they require some time, generally 48 to 72 hours to respond. That should be improved."
AWS Glue is ranked 1st in Cloud Data Integration with 37 reviews while StreamSets is ranked 8th in Data Integration with 24 reviews. AWS Glue is rated 7.8, while StreamSets is rated 8.4. The top reviewer of AWS Glue writes "Provides serverless mechanism, easy data transformation and automated infrastructure management". On the other hand, the top reviewer of StreamSets writes "We no longer need to hire highly skilled data engineers to create and monitor data pipelines". AWS Glue is most compared with AWS Database Migration Service, Informatica PowerCenter, SSIS, Informatica Cloud Data Integration and Talend Open Studio, whereas StreamSets is most compared with Fivetran, Azure Data Factory, Informatica PowerCenter, SSIS and webMethods.io Integration. See our AWS Glue vs. StreamSets report.
See our list of best Cloud Data Integration vendors.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.