Apache Airflow Valuable Features

SUDHIR KUMAR RATHLAVATH - PeerSpot reviewer
Student at University of South Florida

Every feature in Apache Airflow is valuable. The number of operators and features I've used are mainly related to connectivity services and integrated services because I primarily work with GCP. Specifically, I've utilized the BigQuery connectors and operators, as well as Python operators and other runnable operators like Bash. These common operators have been quite useful in my work.

Another thing that stands out is its ease of use for developers familiar with Python. They can simply write their code and set up their environment to run the code through the scheduling engine. It's quite convenient, especially for those in the data engineering field who are already well-versed in Python. They don't need to install any additional tools or perform complex environment setups. It's straightforward for them.

The graphical interface is good because it runs on a DAG (Directed Acyclic Graph).

View full review »
FB
Product Owner at La Poste S.A.
It's well-documented and has plenty of resources online, making it easy to get started. On the other side, there aren’t many possibilities to draw processes that have more options for visually designing workflows and automatically generating processes, as some dedicated ETL tools offer. View full review »
Damian Bukowski - PeerSpot reviewer
Program Python at Santander Bank Polska

I like that Apache Airflow is in Python language, making it easy to use and learn. I like Apache Airflow's versatility. Essentially, if you want to do something, there is generally a webhook that you can use with Apache AirFlow, especially if you use solutions from big companies like Google or Microsoft. Many providers are not from Apache since, with Apache Airflow, it is very easy to develop and integrate applications from various developers.

View full review »
Buyer's Guide
Apache Airflow
April 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.
MW
Analytics Solution Manager at Telekom Malaysia

Since the solution is programmatic, it allows users to define pipelines in code rather than drag and drop.

View full review »
Miodrag Milojevic - PeerSpot reviewer
Senior Data Archirect at Yettel

Our data workflow management is greatly streamlined by the use of Apache Airflow, which proves highly beneficial. Its user-friendly interface makes it straightforward to operate, offering a plethora of features for data preparation, buffering, and format conversion. With its extensive capabilities, Airflow serves as a comprehensive tool for managing our data workflows effectively.

View full review »
Ravan Nannapaneni - PeerSpot reviewer
Senior Lead Engineer at Oliver Wyman

The most valuable feature of Apache Airflow is creating and scheduling jobs. Additionally, the reattempt at failed jobs is useful.

View full review »
Punit_Shah - PeerSpot reviewer
Director at Smart Analytica

One of its most valuable features is the graphical user interface, providing a visual representation of the pipeline status, successes, failures, and informative developer messages. This graphical interface greatly enhances the user experience by offering clear insights into the pipeline's status.

View full review »
UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis

The best thing about the product is its UI. The tool is user-friendly. We can divide our work into different tasks and groups. It gives a graphical representation of the whole flow. It also creates a graph of the complete pipeline. The UI is beautiful. Whenever there is a failure, we can see it at the backend. We can retry at the point where the failure happened. We do not have to redo the whole flow. The user interface is pretty good. It provides details about the jobs. It also provides monitoring features. We can see the metrics and the history of the runs. The administration features are good. We can manage the users.

View full review »
ManojKumar43 - PeerSpot reviewer
Big Data Engineer at BigTapp Analytics

Apache Airflow is easy to use and can monitor task execution easily. For instance, when performing setup tasks, you can conveniently view the logs without delving into the job details. All logs are readily accessible within the interface itself. Examining the logs lets you discern which steps and processes are being executed.

You don't have to configure SMTP for everything. You need to configure email settings, such as email on error, failure, or alert access. With Apache Airflow, you can send emails with just a few lines of code. You don't have to write extensive code to configure SMTP; all those configurations can be accomplished within a few lines of code.

I managed a complex workflow for a finance application project. They use Apache Airflow to orchestrate processes, such as retrieving data from SFTP and landing it into S3. From S3, they trigger Glue jobs based on certain conditions. Additionally, they use the Glue catalog in Glusoft for data management, all orchestrated using Airflow. Furthermore, various logics are written in Airflow DAGs to handle scenarios like security mismatches. For instance, files are sent accordingly if there's a missing security.

Apache Airflow triggers a set of tasks based on DAGs. If you have multiple tags, such as raw, transform, and ready layers, instead of manually triggering each DAGs. In that case, you can integrate them to trigger one, automatically triggering the others. Also, you can put conditions.

View full review »
Pravin Gadekar - PeerSpot reviewer
Google Cloud Architect at Capgemini

The product's most valuable feature is scalability. It helps us run hundreds of data jobs every day.

View full review »
Mikalai Surta - PeerSpot reviewer
Head of Big Data Department at IBA Group

Since it's widely adopted by the community, Apache Airflow is a user-friendly solution.

View full review »
AS
Associate Data Engineer at a outsourcing company with 201-500 employees

The most valuable feature is that it's the most popular data orchestration tool in the market right now. It connects to everything you need.

It's open-source. You have a lot of documentation and a lot of people helping out. It has large communities, so if you need something or you want to ask something, you can. Often, someone else would have already asked that question, and they would have already got the answer, and you can just look it up.

Development on Apache Airflow is really fast, and it's easy to use with the newer updates. Everything is in Python, so it's not hard to understand. They also have a graphical view, so if you are not a programmer and you are just an administrator, you can easily track everything and see if everything is working or not. For notifications, it can connect with different messaging tools such as Slack and Teams, as well as with webhooks. It's very easy to use, and it has a lot of features that you would expect from any of the data orchestration tools.

View full review »
Nomena NY HOAVY - PeerSpot reviewer
Lead Data Scientist at MVola

The user experience of Apache Airflow is good. The solution is flexible for all programming languages for all frameworks. I also value that it is used for monitoring. Apache Airflow helps to easily integrate data sources with other products.

View full review »
AT
Lead of Monitoring Tech at a educational organization with 1,001-5,000 employees

We are already on Python. Since Apache works very well on Python, we can manage everything and create pipelines there.

View full review »
VenugopalKathirvel - PeerSpot reviewer
Senior Member Of Technical Staff, Engineering Operations at VMware

Apache Airflow's best feature is its flexibility.

View full review »
MW
Analytics Solution Manager at Telekom Malaysia

The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot.

It's such a natural fit because our engineers are also Python-based, and I think we also quite like that we don't have to learn different kinds of UIs. Airflow is based on standard software packages, so we don't have to learn anything new in the way of opinionated UIs from different vendors.

View full review »
JR
Senior Software Engineer at a pharma/biotech company with 1,001-5,000 employees

I like the UI rework, it's much easier.

I use XCom for derived variables that need to pass between tasks. I don't really tend to use it for passing data, but only for a derived variable. For example, I don't have to re-query something every time, with one-task uses. I use the JSON comp for overwriting certain parameters.

In our use cases, some of the inputs of the dataset are files that we pulled out of S3. Sometimes they need to re-do those files, but we don't need to change any logic, we just need to redo the bills. Rather than redeploying the code to point to a new S3 bucket, we overwrite it to point to a different S3 key.

I have read that there are many different workflow pipelining tools in the biotech space, such as Snakemake and Nextflow.

There is also a CWL plugin that we may look into at some point. 

Eventually, we might have a use case where a researcher has a pipeline they run locally, and then we want to convert that to a DAG. 

The CWL-Airflow plugin would be useful for that. This might be something to look into later. But that would be like months, or maybe a year from now.

View full review »
Luiz Cesar Gosi - PeerSpot reviewer
Senior Analytics Engineer at TalkDesk

Apache Airflow is a pretty useful tool for collecting information. Apache Airflow is a pretty easy solution that can be used with Python. The solution's UI allows me to collect all the information and see the code lines.

View full review »
YS
Software engineer at Naver Corp

Kubernetes from the batch application is the most useful to my team. It uses Python. It is simple. There are not many learning costs. We're using the scheduler. We don't need to care about the batch job every day. We just need to notice when the alerts are firing. It is convenient for us. The product supports many other services, like Kubernetes. I saw some custom applications and programs. The solution integrates very well with other products.

View full review »
Mahendra Prajapati - PeerSpot reviewer
Senior Data Analytics at a media company with 1,001-5,000 employees

The best feature is the customization that can be done using Python. For example, there are use cases where we have to tweak the algorithm and with Apache Script Rate, we have extra functionality that helps to change the underlying process. We can define our algorithms and processes using Python.

View full review »
Fadi Bathish - PeerSpot reviewer
Project Manager at Siren Analytics

The solution is quite configurable so it is easy to code within a configuration kind of environment. 

The ease of learning and using the solution is quite good. The learning curve is low so new users can learn in a short period of time in comparison to other products. 

View full review »
Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC

The ability to easily set up and deploy workflows with Airflows is valuable. Additionally, designing processes and workflows is easier, and it assists in coordinating all of the different processes.

View full review »
SG
Engineering Manager - OTT Platform at Amagi

The reason we went with Airflow is its DAG presentation, that shows the relationships among everything. It's more of a configuration-driven workflow. 

It's all Python, as well. The majority of the configuration is Python-friendly.

View full review »
JP
Senior Solutions Architect/ Software Architect at a comms service provider with 51-200 employees

The product integrates well with other pipelines and solutions.

The ease of building different processes is very valuable to us. The difference between Kafka and Airflow, is that it's better for dealing with the specific flows that we want to do some transformation. It's very easy to create flows. 

View full review »
AN
Solution Architect at EPAM Systems

The most valuable feature is the UI, for automation.One can monitor all ETL processes in single screen. Complex workflows are shown as DAGs SVG images.

This is a simple tool to automate using Python.

View full review »
AJ
Associate Director - Technologies at a tech services company with 51-200 employees

The most valuable feature is the workflow.

View full review »
CP
Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees

I do not have specific feedback because it is quite early in the review stage for comment.

View full review »
Buyer's Guide
Apache Airflow
April 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.