AWS Glue Room for Improvement

Ajaykumar Myana - PeerSpot reviewer
Senior Software Developer at a computer software company with 10,001+ employees

In terms of performance, if they can further optimize the execution time for serverless jobs, it would be a welcome improvement. Faster code execution would be beneficial. If AWS could enhance the serverless execution capabilities, like increasing CPU, RAM, and processing speed, that would be great.

View full review »
AmitMataghare - PeerSpot reviewer
Associate Director at a consultancy with 10,001+ employees

AWS Glue Studio has undergone a lot of enhancements in the last couple of months. An improvement that can help the solution is if the user interface can become more user-friendly and allow for features like drag and drop, allowing it to build transformations. There can be a good improvement if the product itself supports different kinds of transformations so that the pipeline, which we want to create, can be done easily since right now, we have to write a code to do so in our company. Only people who can code, either in Java or Python, can use the product freely. Those who don't know Java or Python might find using AWS Glue difficult.

AWS has pricing for spot instances that reduces the cost substantially, but that is not available for AWS Glue AWS pricing for spot instances comes for products like EC2, and if the same gets introduced for AWS Glue, then the pricing can substantially reduce.

View full review »
CE
Senior Software Engineer at a consumer goods company with 10,001+ employees

The start-up time is really high right now. For instance, when you start up a new job, you have to wait for five or eight minutes before it starts. If the start-up time is reduced to one or two minutes, it will be great.

It will be better to have a direct linkage to Redshift in AWS. If we can use data catalogs from Redshift, it will be so easy to create some data catalogs. Currently, we can only use data catalogs from S3.

View full review »
Buyer's Guide
AWS Glue
April 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.
ParamShah - PeerSpot reviewer
Engineering Manager at Milestone Technologies

There are output limitations and configuration of its three parts. There was a lot of trial and error that we had to go through. It is not clear how the partition discovery would have been affected by more data coming in. We've made some expensive mistakes, which, if there were any tutorials available or if there was easy documentation available with FAQs, could have been avoided. There is documentation, but it doesn't cover all.

There are three specific partition changes, and AWS Glue is tightly tied to Athena. We don't have much flexibility in managing the Athena.

AWS Glue could integrate with an AI model or a more advanced version that processes chat-based inputs rather than configuration. This would align it more closely with the functionalities of chat-based interfaces, making it easier to adopt.

View full review »
Mbaye Babacar Gueye - PeerSpot reviewer
Owner at Bennen

One area that could be improved is the ETL view. The drag-and-drop interface is not as user-friendly as some other ETL tools. 

Additionally, AWS Glue can sometimes be slow, especially when processing large datasets. It was sometimes a bit slow. Also, I couldn't directly use bucketed data. With Elastic Glue, you had to convert your data frames into the correct format before connecting them using the drag-and-drop interface. So that's something I didn't like because the conversion process wasn't straightforward. 

In future releases, I would like to see a feature that could trigger Glue pipeline using an API or something. 

View full review »
Vimalathithan M - PeerSpot reviewer
Associate Director - Delivery (Technology DWH & Data Engineer) at MOBIUS KNOWLEDGE SERVICES PRIVATE LIMITED

We face performance issues when using AWS Glue for data transformation and integration. It takes almost three to four hours to execute single transformations, which is a lot. We want to improve the performance to meet customer requirements.

Mainly, I am focused on improving the performance aspect because the customer is keen on this improvement.

View full review »
RajKumar23 - PeerSpot reviewer
Sr Associate at Cognizant

The solution’s stability could be improved.

View full review »
Syed Zakaulla - PeerSpot reviewer
Project Manager at Softway

AWS Glue had some issues, which required optimization, particularly in terms of the number of workers you deploy, and that's where costing comes in. Cost-wise, AWS Glue is expensive, so that's an area for improvement. My company did some modifications, which turned out to be successful, so overall, the solution works fine.

Even though there is a backup, you need to know what's happening. You need to understand why there's a failure. AWS Glue doesn't provide the information, so my company uses its logs. The development team also doesn't have specific answers because the team is still playing around with the process, which means the company is still trying to figure out other areas for improvement in AWS Glue.

The process for setting up the solution was also complex, which is another area for improvement.

AWS should provide help during migration and assist its users. Otherwise, it's a nightmare.

View full review »
Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC

The mapping area and the use of the data catalog from Glue could be better. I would say those two are the main things we'd like to see improvements on.

The solution needs support for big data.

As I understand it, Glue is based on Lambdas and Lambdas have some limitations as far as running them continuously. Sometimes they get dropped, and they have to be reinitialized.

View full review »
Neelabh Sharma - PeerSpot reviewer
Data Engineer at Scania

The product is expensive for data streaming compared to EMR. This area needs improvement.

View full review »
Ankit  Shukla - PeerSpot reviewer
Data Engineer at YASH Technologies

The monitoring is not that good. We'd like to see job progress be more clear. Right now, how we can view that is not that good. The is that mostly it is Python or Scala code based. The UX is lacking.

There is a bit of a learning curve, particularly during the setup process. 

More connectors should be included.

View full review »
UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis

AWS Glue is more costly compared to other tools like Airflow. It would be better if the solution's pricing could be reduced. The default scheduling that AWS Glue provides is not as good as Airflow. The scheduler of AWS Glue could be improved because you cannot customize it.

View full review »
NM
VP- Cloud Data/ Solution Architect at a financial services firm with 10,001+ employees

I have encountered challenges with multi-region support. 

View full review »
Sunil Morya - PeerSpot reviewer
Consultant at a tech vendor with 10,001+ employees

I haven't looked into Glue in terms of seeking out flaws. I've not come across missing features. 

View full review »
Liana Iuhas - PeerSpot reviewer
CEO at Quark Technologies SRL

The interface for AWS Glue could improve, they do not put a lot of details. You can write the code, in PySpark or in Scala, which is a big advantage, it is only easy to use for a developer. It will be difficult for new users to enter the cloud environment.

If business users want to run their own graphs they will not have the opportunity to use such features, such as running code inside AWS Glue in Spark, which will be complex for them.

View full review »
YC
Data Engineer at Tata Consultancy

The solution needs to expand its 30-minute query or runtime. Sometimes it fails with certain data types such as Athena due to the limited runtime. Some large data sets run overtime during busy hours so we try to avoid failures by running data at idle times or at night. 

On occasion, the solution's dashboard reports that a project failed due to runtime but it actually succeeded. This can be quite confusing. 

View full review »
Murilo Hallgren - PeerSpot reviewer
Data Engineer at a consultancy with self employed

The price of the solution could improve.

View full review »
DS
ECM CONSULTANT/ARCHITECT/SOFTWARE DEVELOPER, DELUXE MN at a tech services company with 5,001-10,000 employees

There is a learning curve to this tool.

View full review »
UK
Consultant - Business Operations at a computer software company with 10,001+ employees

The setup and installation is a bit complex without advanced knowledge or training. It would be easier for an AWS expert or someone in DevOps.

Transformations need improvements to be more user friendly and rely less on coding like Matillion. 

View full review »
Jorge Encinas - PeerSpot reviewer
Sr. Data Engineer at a tech services company with 5,001-10,000 employees

It would be better if it were more user-friendly. The interesting thing we found is that it was a little strange at the beginning. The way Glue works is not very straightforward. After trying different things, for example, we used just the console to create jobs. Then we realized that things were not working as expected. After researching and learning more, we realized that even though the console creates the script for the ETL processes, you need to modify or write your own script in Spark to do everything you want it to do.

For example, we are pulling data from our source database and our application database, which is in Aurora. From there, we are doing the ETL to transform the data and write the results into Redshift. But what was surprising is that it's almost like whatever you want to do, you can do it with Glue because you have the option to put together your own script. Even though there are many functionalities and many connections, you have the opportunity to write your own queries to do whatever transformations you need to do.

It's a little deceiving that some options are supposed to work in a certain way when you set them up in the console, but then they are not exactly working the right way or not as expected. It would be better if they provided more examples and more documentation on options.

View full review »
Senthil Kumar Veerasamy - PeerSpot reviewer
Senior Manager, Analytics at Azendian

Since AWS Glue is not like an enterprise ETL tool, we need to put quite a lot of effort into customization. The solution has a visual editor, but most ETL transformations cannot be implemented or constructed using that. We always have to do a script. The solution's visual ETL tool is of no use for actual implementation.

View full review »
ShilpaShivapuram - PeerSpot reviewer
Principal Data Architect at Wells Fargo

AWS Glue would be improved by making it easier to switch from single to multi-cloud.

View full review »
Shifa Shah - PeerSpot reviewer
Data engineer at nust

While working on AWS Glue, I could not find any training material for it. Although it's not a problem with the product, the solution could include better documentation.

View full review »
Adriano Junior Gouveia Gonçalves - PeerSpot reviewer
Professor at a tech services company with 51-200 employees

Glue could perform better. It sometimes takes too long to test a Glue job. Google Cloud Platform offers more Python scripts than AWS. 

View full review »
SP
Associate Consultant at a tech vendor with 10,001+ employees

The solution could be cheaper. The price of the solution is an area that needs improvement.

View full review »
Sainagaraju Vaduka - PeerSpot reviewer
Data solution architect at a pharma/biotech company with 5,001-10,000 employees

I would like to see stable libraries at the moment they are not there.

View full review »
YB
Consultant Data junior at a computer software company with 51-200 employees

The product has only a few built-in transformations; additional custom-building transformations could be improved in the next release.

For additional features, I would like documentation on the equivalent of legacy ETL tools and their equivalent in AWS to make it easier for users to migrate their ETL processing to the cloud. It would save time and help users find the best transformation or solution to satisfy their new business needs.

View full review »
BV
Manager at a construction company with 51-200 employees

I would like to see in general, documentation, on the limitations on which loads you can actually pull in when you are running Python. The additional Python Jupyter Notebook now has been nice. But yeah, generally speaking, you can not import every LOB. You can import branders now and you can use photos, but you can not import a lot of the other sorts of statistical-based loads. That is an issue currently. I would like to see a more robust interface on the no-code side. This would be nice to be able to split cells.

View full review »
MA
Cloud Data Engineer at jems groupe

The solution does not work with Spark DataFrame. We can use the solution's DynamicFrame for this function but transformations are expensive. 

Not enough resources or services are available to run managed Spark jobs within the solution. We have reached out to Amazon many times regarding this issue. 

The solution should offer features for streaming data in addition to batching data. We can use other products such as Scala or Python but prefer the features be available in the solution. 

View full review »
GV
Data Engineer at a computer software company with 501-1,000 employees

They should improve the solution's performance in case of large amounts of data. Currently, AWS fails to handle massive databases acquired from various sources. Also, it is challenging to queue the data or use a standard code in AWS environment. We need to install a third-party tool to tackle the issue. We need to use another tool to convert the data as well. Thus, we are using multiple tools to handle the database. They should work on this particular area.

View full review »
Sashi Dhar - PeerSpot reviewer
Operations executive at Wipro Infotech

There should be more connectors for different databases.

View full review »
BR
CEO and Founder at HartB

The crucial problem with AWS Glue is that it only works with AWS. It is not an agnostic tool like Pentaho. In PowerCenter, we can install the forms from Google and other vendors, but in the case of AWS Glue, we can only use AWS.

View full review »
Suraj Sachdeva - PeerSpot reviewer
Data Engineer | Developer at Sakshath Technologies

The technical support for this solution could be improved. In future, we would like to connect more services like Athena or Kinesis to help control more loads of data.

View full review »
DB
Net Full-Stack developer at a tech services company with 201-500 employees

When there is a need to configure connections to different database sources in respect of the target, it would be good if it were easier to deal with roles. I am referring to the need to configure connections in a different target process, something which would require a certain time outlay for configuring VPC and checking that everything is okay, in respect of the creation of required roles. It would save time were this process to be made easier and more user friendly. 

The technical support depends on the type of question, whether there is a need to understand additional inter-related information on multiple levels. Overall, I consider the technical support to be fine, although the response time could be faster in certain cases. 

View full review »
AS
Team Lead at a financial services firm with 5,001-10,000 employees

Currently, it supports only two languages in the background: Python and Scala. From our customization point of view, it would be helpful if it can also support Java in the background.

View full review »
KM
Cloud Solution Architect at a tech services company with 1-10 employees

In terms of improvement, the performance of AWS Glue could be faster.

View full review »
Diksha  Hirole - PeerSpot reviewer
Data Engineer at a tech services company with 201-500 employees

There are a couple of issues with AWS Glue. First, AWS Control randomly logs off, which disturbs coding. Second, if there's a cluster-related configuration, we have to make worker notes, which is quite a headache when processing a large amount of data. In the next release, AWS Glue should include more transformations with AWS Studio.

View full review »
Buyer's Guide
AWS Glue
April 2024
Learn what your peers think about AWS Glue. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.