We asked business professionals to review the solutions they use. Here are some excerpts of what they said:
"One of the best features of the solution is its ability to easily integrate with other AWS services."
"Data catalog and triggers are the two best features for me. AWS Glue has its own data catalog, which makes it great and really easy to use. Triggers are also really good for scheduling the ETL process."
"Its user interface is quite good. You just need to choose some options to create a job in AWS Glue. The code-generation feature is also useful. If you don't want to customize it and simply want to read a file and store the data in the database, it can generate the code for you."
"The facility to integrate with S3 and the possibility to use Jupyter Notebook inside the pipeline are the most valuable features."
"Once you have Infosphere up and running properly, it is stable."
"We are mostly using transmission rules. It has a lot of functions and logic related to transmission. It is a user-friendly tool with in-built functions."
"Offers great flexibility."
"The most valuable feature is the product's versatility to inject data."
"The Hierarchical Data Stage is good."
"It's a robust solution."
"ETL is the most valuable feature."
"As a data integration platform, it is easy to use. It is quite robust and useful for volumetric analysis when you have huge volumes of data. We have tested it for up to ten million rows, and it is robust enough to process ten million rows internally with its parallel processing. Its error logging mechanism is far simpler and easier to understand than other data integration tools. The newer version of InfoSphere has the data catalog and IDC lineage. They are helpful in the easy traceability of columns and tables."
"The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great. It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3."
"Overall, I consider the technical support to be fine, although the response time could be faster in certain cases."
"Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background."
"The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS."
"Its documentation is not up to the mark. While building APIs, we had a lot of problems trying to get around it because it is not very user-friendly. We tried to get hold of API documentation, but the documentation is not very well thought out. It should be more structured and elaborate. In terms of additional features, I would like to see good reporting on performance and performance-tuning recommendations that can be based on AI. I would also like to see better data profiling information being reported on InfoSphere."
"Currently lacking virtualization ability."
"The initial setup could be more straightforward."
"The setup is extremely difficult."
"The template mapping could be easier."
"The interface needs improvement. It is really too technical. That is the main problem."
"It would be useful to provide support for Python, AR, and Java."
"There are three things that could improve - the cloud, monitoring and cloud integration. It's a solid product but not a modern one and of course it depends what you're looking for."
"The pricing is a bit higher than other solutions like Athena and EC2. If the pricing becomes more scaled or flexible, it will be good because you have to pay 44 cents just for one DPU for an hour. If you increase DPUs to 5 or 10, the pricing gets multiplied. There are also some time limits like 0 to 10 minutes or 10 to 20 minutes. If the pricing is according to the minutes, it would be better because you have to limit your job to 10 minutes or 20 minutes."
"Its price is good. We pay as we go or based on the usage, which is a good thing for us because it is simple to forecast for the tool. It is good in terms of the financial planning of the company, and it is a good way to estimate the cost. It is also simple for our clients. In my opinion, it is one of the best tools in the market for ETL processes because of the fact that you pay as you use, which separates it from other big tools such as PowerCenter, Pentaho Data Integration, and Talend."
"It is not expensive. AWS Glue works on the serverless architecture. We get charged for the time the server is up. For our use case, we have to use it once in a day, and it is not expensive for us."
"It's quite expensive."
"It's very expensive."
"It is quite expensive."
"The cost is too high."
"The price is expensive but there are no licensing fees."
"Our internal team takes care of group licensing and cost. We don't have individual licenses. We have group licensing at the company level. Usually, IBM doesn't charge anything separately on the licensing side. For storage and everything else, we are paying around $6,000 per month, which is not very high. It includes Linux data storage, execution, and licensing. They're charging $40 for one-hour execution. Based on that, we are spending around $2,000 on the production environment and $1,000 on the lower environment for testing and development-side executions. For the mainframe, we are using the Db2 mainframe database, and we are spending around $1,000 on the Db2 mainframe database as well. All this comes out to be around $6,000. We, however, would like to have some cost reduction."
AWS Glue is a fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. You can create and run an ETL job with a few clicks in the AWS Management Console. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores the associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once cataloged, your data is immediately searchable, queryable, and available for ETL.
AWS Glue is ranked 5th in Cloud Data Integration with 4 reviews while IBM InfoSphere DataStage is ranked 6th in Data Integration Tools with 11 reviews. AWS Glue is rated 7.8, while IBM InfoSphere DataStage is rated 7.4. The top reviewer of AWS Glue writes "Improved our time to implement a new ETL process and has a good price and scalability, but only works with AWS". On the other hand, the top reviewer of IBM InfoSphere DataStage writes "Robust, easy to use, has a simple error logging mechanism, and works very well for huge volumes of data". AWS Glue is most compared with Talend Open Studio, AWS Database Migration Service, Informatica PowerCenter, SSIS and Informatica Enterprise Data Catalog, whereas IBM InfoSphere DataStage is most compared with SSIS, Talend Open Studio, Azure Data Factory, Informatica PowerCenter and IBM InfoSphere Information Server.
We monitor all Cloud Data Integration reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.