We are implementing a solution in AWS for one of our customers. It is more of a data analytics solution. We wanted to process data from different sources and put it into a central repository that can be used for any analysis or predictive modeling.
We use the solution to build tables on CSV data. We get data from some different sources, pull it in S3, and then create tables using Glue to get some metrics out of that data.
One common use case is migrating data from one system to another. So, mostly migrating data and data engineering, getting real-time or near-real-time data using Lambda functions and migrating big data from on-prem to the cloud for historical data before starting a project.
In my company, we use AWS Glue to build data engineering pipelines, so we ingest data from either S3 or other sources and put it back into Redshift, where we have a data lake or data warehouse.
Senior Software Developer at a computer software company with 10,001+ employees
Real User
Top 10
2023-07-31T17:41:50Z
Jul 31, 2023
I had the source data, which was unstructured and non-fixable, and my responsibility was to convert it into structured data. For this task, I used PySpark as the programming language. With Python, I implemented the creation of a data frame using Glue jobs. Since Glue jobs are a serverless mechanism, I deployed my code into the Glue job, and that's how I got the job done.
Associate Director - Delivery (Technology DWH & Data Engineer) at MOBIUS KNOWLEDGE SERVICES PRIVATE LIMITED
Real User
Top 20
2023-04-26T09:07:00Z
Apr 26, 2023
Our primary use cases include pulling data from multiple sources and loading it into the central capacity for data transformation, integration, and processing.
Currently, we are utilizing AWS Glue for various ETL workloads, specifically in the life sciences domain. Our primary objective is to acquire data from various sources. Then, we store it in Redshift. This is where the complete use case of AWS Glue comes into the picture.
We're using GPU 0.2 in ten verticals and wanted to use AWS Glue only for one purpose: to optimize Amazon Redshift. We have millions of data that we have to back up. Previously, we did it once every six months, but the client data have been very interactive, and we need spontaneous back and forth of data communication in real-time. In one second, we have almost one million records that come and go continuously. The client wanted to keep all data because they're using it for analytics and wanted to back up the data every second without delay. We tried to optimize Amazon Redshift and found out about AWS Glue, which comes with massive costs, but the client is willing to pay.
We use the solution to do the usual type of transformations that before required ETL. It's mostly transformation-type purposes that we have, including transforming data from source to target. Also, we are replacing the usual ETLs with Glue, for example.
We are using AWS Glue for transforming firewalls synced to the Data Lake in the bronze zone. The ATL uses the solution to transform fields in the silver layer and later we will produce the gold zone. We are using the Delta Lake Architecture.
My colleagues work with Spark, PySpark, and Scala as programming languages for writing complex aggregations. They have a repository in order to have a general view of all the sources and jobs on the platform and AWS Glue is very helpful.
Data Engineer | Developer at Sakshath Technologies
Real User
Top 10
2022-06-21T13:28:38Z
Jun 21, 2022
The key role of Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it.
Sr. Data Engineer at a tech services company with 5,001-10,000 employees
MSP
Top 20
2022-06-16T15:42:50Z
Jun 16, 2022
We used AWS Glue to build our data warehouse. We built prototypes to go all the way all across their warehouse platforms. From AWS Glue to Spreadsheets and then QuickSight, that's how we're building their warehouse.
ECM CONSULTANT/ARCHITECT/SOFTWARE DEVELOPER, DELUXE MN at a tech services company with 5,001-10,000 employees
Real User
2021-12-02T16:14:50Z
Dec 2, 2021
Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs. It is tailored and customized to use with SQL Server, which works very well in that platform. If you want to use other data sources, the NoSQL concept makes it very easy, because missing data can be inserted as a new column or with null values. That is not the case with many other tools. If you have on-premises tools, such as IIS, they don't manage missing data well.
It is a good tool for us. All the implementation in our company is done with AWS Glue. We use it to execute all the ETL processes. We have collected more or less five terabytes of information from the internet by now. We process all this data in our cloud platform and normalize the information. We first put it on a data lake that we have here on the AWS tool. After that, we use AWS Glue to transform all the information collected around the internet and put the normalized information into a data warehouse.
AWS Glue is a serverless cloud data integration tool that facilitates the discovery, preparation, movement, and integration of data from multiple sources for machine learning (ML), analytics, and application development. The solution includes additional productivity and data ops tooling for running jobs, implementing business workflows, and authoring.
AWS Glue allows users to connect to more than 70 diverse data sources and manage data in a centralized data catalog. The solution facilitates...
We are implementing a solution in AWS for one of our customers. It is more of a data analytics solution. We wanted to process data from different sources and put it into a central repository that can be used for any analysis or predictive modeling.
We use the solution to build tables on CSV data. We get data from some different sources, pull it in S3, and then create tables using Glue to get some metrics out of that data.
AWS Glue is a versatile tool and we mostly use it for "lift and shift" server migrations.
We use AWS Glue for ETL batch processing purposes.
One common use case is migrating data from one system to another. So, mostly migrating data and data engineering, getting real-time or near-real-time data using Lambda functions and migrating big data from on-prem to the cloud for historical data before starting a project.
We use AWS Glue for data analytics.
In my company, we use AWS Glue to build data engineering pipelines, so we ingest data from either S3 or other sources and put it back into Redshift, where we have a data lake or data warehouse.
I had the source data, which was unstructured and non-fixable, and my responsibility was to convert it into structured data. For this task, I used PySpark as the programming language. With Python, I implemented the creation of a data frame using Glue jobs. Since Glue jobs are a serverless mechanism, I deployed my code into the Glue job, and that's how I got the job done.
I constructed a straightforward ETL job using AWS Glue, wherein I had to load a couple of files in the Teradata database.
Our primary use cases include pulling data from multiple sources and loading it into the central capacity for data transformation, integration, and processing.
Currently, we are utilizing AWS Glue for various ETL workloads, specifically in the life sciences domain. Our primary objective is to acquire data from various sources. Then, we store it in Redshift. This is where the complete use case of AWS Glue comes into the picture.
The primary use cases of AWS Glue in our organization are for implementing ETL processes and for data flow.
We're using GPU 0.2 in ten verticals and wanted to use AWS Glue only for one purpose: to optimize Amazon Redshift. We have millions of data that we have to back up. Previously, we did it once every six months, but the client data have been very interactive, and we need spontaneous back and forth of data communication in real-time. In one second, we have almost one million records that come and go continuously. The client wanted to keep all data because they're using it for analytics and wanted to back up the data every second without delay. We tried to optimize Amazon Redshift and found out about AWS Glue, which comes with massive costs, but the client is willing to pay.
Our primary use case is ETL.
We use the solution to do the usual type of transformations that before required ETL. It's mostly transformation-type purposes that we have, including transforming data from source to target. Also, we are replacing the usual ETLs with Glue, for example.
We are primarily using it for batch crossing and transformations.
We are using AWS Glue for transforming firewalls synced to the Data Lake in the bronze zone. The ATL uses the solution to transform fields in the silver layer and later we will produce the gold zone. We are using the Delta Lake Architecture.
My colleagues work with Spark, PySpark, and Scala as programming languages for writing complex aggregations. They have a repository in order to have a general view of all the sources and jobs on the platform and AWS Glue is very helpful.
We are using it for day-to-day ETL jobs. It is being used to transfer data from Teradata to the cloud. We are using its latest version.
I mainly use AWS Glue for ETL purposes and batch processing of data.
The key role of Glue is that it hosts our metadata before rolling out our actual data. This is the major advantage of using this solution and our clients client have been very satisfied with it.
We used AWS Glue to build our data warehouse. We built prototypes to go all the way all across their warehouse platforms. From AWS Glue to Spreadsheets and then QuickSight, that's how we're building their warehouse.
Glue is a NoSQL-based data ETL tool that has some advantages over IIS and ISAs. It is tailored and customized to use with SQL Server, which works very well in that platform. If you want to use other data sources, the NoSQL concept makes it very easy, because missing data can be inserted as a new column or with null values. That is not the case with many other tools. If you have on-premises tools, such as IIS, they don't manage missing data well.
We use the solution as a level of loading data from the source systems.
It is a good tool for us. All the implementation in our company is done with AWS Glue. We use it to execute all the ETL processes. We have collected more or less five terabytes of information from the internet by now. We process all this data in our cloud platform and normalize the information. We first put it on a data lake that we have here on the AWS tool. After that, we use AWS Glue to transform all the information collected around the internet and put the normalized information into a data warehouse.
We are using it for file ingestion. Its primary role is to ingest a file from a vendor to a database.
We are collecting some TV audience data and analyzing it.