We just raised a $30M Series A: Read our story

Apache Airflow OverviewUNIXBusinessApplication

Apache Airflow is #8 ranked solution in BPM Software. IT Central Station users give Apache Airflow an average rating of 8 out of 10. Apache Airflow is most commonly compared to Camunda Platform:Apache Airflow vs Camunda Platform. Apache Airflow is popular among the midsize enterprise segment, accounting for 58% of users researching this solution on IT Central Station. The top industry researching this solution are professionals from a computer software company, accounting for 24% of all views.
What is Apache Airflow?

Airflow is a platform to programmatically author, schedule and monitor workflows.

Use airflow to author workflows as directed acyclic graphs (DAGs) of tasks. The airflow scheduler executes your tasks on an array of workers while following the specified dependencies. Rich command line utilities make performing complex surgeries on DAGs a snap. The rich user interface makes it easy to visualize pipelines running in production, monitor progress, and troubleshoot issues when needed.

When workflows are defined as code, they become more maintainable, versionable, testable, and collaborative.

Apache Airflow is also known as Airflow.

Apache Airflow Buyer's Guide

Download the Apache Airflow Buyer's Guide including reviews and more. Updated: November 2021

Apache Airflow Customers

Agari, WePay, Astronomer

Apache Airflow Video

Pricing Advice

What users are saying about Apache Airflow pricing:
  • "Although Airflow is open source software, there's also commercial support for it by Astronomer. We personally don't use the commercial support, but it's always an option if you don't mind the extra cost."
  • "We are using the open-source version of Apache Airflow."
  • "The pricing for the product is reasonable."

Apache Airflow Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
Sudhir Ganti
Engineering Manager - OTT Platform at Amagi
Real User
Top 10
Helps us maintain a clear separation of our functional logic from our operational logic

Pros and Cons

  • "The reason we went with Airflow is its DAG presentation, that shows the relationships among everything. It's more of a configuration-driven workflow."
  • "One specific feature that is missing from Airflow is that the steps of your workflow are not pipelined, meaning the stageless steps of any workflow. Not every workflow can be implemented within Airflow."

What is our primary use case?

We are a technology, media, and entertainment-technology company. We are using Apache Airflow for architecting our media workflows. We are using it for two major workflows.

We have had it set up for some time on our own cloud. Recently, we migrated the setup to AWS.

How has it helped my organization?

Airflow is our first choice because we wanted a clear separation of our functional logic from our operational logic. We don't want our microservices to have the cross-cutting responsibilities of our operational logic. Right now, our microservices are the core business' inner functional logic. The majority of our distribution, our decision making, and the majority of our workflow operational responsibilities have been added to Airflow.

What is most valuable?

The reason we went with Airflow is its DAG presentation, that shows the relationships among everything. It's more of a configuration-driven workflow. 

It's all Python, as well. The majority of the configuration is Python-friendly.

What needs improvement?

One specific feature that is missing from Airflow is that the steps of your workflow are not pipelined, meaning the stageless steps of any workflow. Not every workflow can be implemented within Airflow. For example, Step 1 of my workflow will have output which I definitely want to automatically be provided as an input to my Step 2. At the workflow level, we want to have common state management where, across steps, we'll be able to reach the state information. Right now, we're using an external state repository to maintain the state.

If Airflow could come up with some kind of implementation, where not every step of the pipeline is an independent step, that would be helpful. I would like it if a part of the output of your previous steps could be Apache input for your next step. That kind of pipeline is missing. When we consider other products like jBPM, Camunda, or Cadence, they have the concept of pipelining.

I would also like to see support for more platforms, in terms of programming BPMs. Cadence supports Golang and Java. Legacy components can be from any platform, so if they could provide more client support for Java client library and Golang, that would be helpful. I want it to program in Java.

For how long have I used the solution?

I have been using Apache Airflow for more than a year.

What do I think about the scalability of the solution?

It's definitely scalable.

We have been using Airflow for sometime but we are not heavily dependent on it. We only have a couple of use cases being executed by Airflow. 

Because we have some data engineering problems, we have a good amount of analytics systems. We have a high volume of data that comes into our system, along with a lot of email, and we have to have an automated data pipeline. Given that, we have all these computing capabilities that are built of microservices. The beauty of it is its scalability. It has every step of your workflow, and it has scheduler capabilities. Every step of your workflow is delegated to one of your nodes. That is being scaled per your computing needs.

We are still evolving. Our business processes are not completely automatic. We're still in the process of identifying what all the automation cases are that we can bring under Airflow. We would like to leverage one common orchestrator or workflow BPM for our complete ecosystem. So we have some architects in our system who are happy with Airflow and others who would like to migrate to some other BPM like Cadence or Apache NiFi. There are a lot of orchestrators and we're just out of the gate. Airflow is still not being heavily used in our enterprise.

Which solution did I use previously and why did I switch?

This is the first workflow BPM tool that we are using in our platforms.

How was the initial setup?

There is comprehensive documentation for setting up a simple workflow and you just follow the documentation for setting things up. We're all engineers so we don't mind if the steps are lengthy, in terms of setting up the system. I'm quite okay with the documentation provided for getting your system up and running.

But I would appreciate it if they published a portal where we could see in what way other businesses, or other technology companies are solving their problems, with some case studies, using Airflow. It would help us to review their case studies. My biggest problem at the time when I was deciding whether Airflow fit our needs or not, was that I was looking for some case studies of technology companies that are already using the solution. With Camunda and jBPM, there is a good quantity of case studies available online.

Which other solutions did I evaluate?

There is no scarcity of BPMs. There are many products online: either open-source or community products or licensed products. There are many good BPMs. The reason that Airflow is in my system is that some of our workflows which we have onboarded are also on Python. Airflow complements that. But the first and foremost ability of any orchestrator should be to integrate with any underlying platform, be it a Java platform or a Python platform. That's the beauty of an orchestrator.

What other advice do I have?

We have a team of people, four to five team members, who initially evaluated Airflow and  wanted to implement it.

We have customers onboarded on our legacy systems. I cannot disrupt the service and bring everything into Airflow. I have to onboard Airflow seamlessly, while I protect my current, ongoing business systems. So I'm trying to balance things here. We have only been able to onboard a couple of workflows. Eventually, we want to do it more fully, but there were a few challenges as I told you: There is no pipeline to take information, which is forcing me to retain my state in a separate state repository. That would be the next big area where I would like to see improvement.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
JP
Senior Solutions Architect/ Software Architect at a comms service provider with 51-200 employees
Real User
Top 5
Integrates well with other pipelines and builds different processes well but the scalability needs improvement

Pros and Cons

  • "The product integrates well with other pipelines and solutions."
  • "The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not."

What is our primary use case?

We normally use the solution for creating a specific flow for data transformation. We have several pipelines that we use and due to the fact that they're pretty well-defined, we use it in conjunction with other tools that do the mediation portion. With Airflow, we do the processing of such data.

What is most valuable?

The product integrates well with other pipelines and solutions.

The ease of building different processes is very valuable to us. The difference between Kafka and Airflow, is that it's better for dealing with the specific flows that we want to do some transformation. It's very easy to create flows. 

What needs improvement?

The graphics in the past have not been ideal.

We have several areas where we feel they could improve in terms of being a little bit more flexible. One is implementation. Even though we customized it, there were some specific things we had to do with the image by itself.

The management integration was challenging as well. It requires a lot of work on our end. We were creating our own way to integrate things specifically with specific tools. There's not really an ease of management out-of-the-box option for integration. We needed to become a little bit creative to solve that ourselves.

The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not.

There is no SDC versioning. There's no virtual control for pipelines. We have to build several pipelines for several flows, yet there's not a virtual control to generate them.

There's no Python SDK. We need to generate our own scripts and upload them and put them there. However, there's not a realistic case that we can get connected to them. On top of that, the API sets that are provided are very limited. They are not as rich as others. You cannot do much with them.

For how long have I used the solution?

I've been using the solution for maybe three years at this point. It hasn't been too long.

What do I think about the stability of the solution?

The solution is largely stable. Obviously when you start creating more use cases, then you realize the limitations, however, it's not really, really bad.

What do I think about the scalability of the solution?

Due to the fact that the solution is on the cloud, we thought it would be fairly easy to scale. This is proving not to be the case and scalability is limited.

The challenging part is to make it really flexible in a cloud-native environment. With other applications, what you have there is the scalability that can be sensitive to your needs, based on the amount of data you are putting into the flow.

Instead of you having to create your own logic to scale it up, it should be a little more efficient on how it gets integrated into the whole environment. You have to get a little bit creative and put some commands and some logic in there and be monitoring everything. You build everything - versus other options that are more out of the box. With other solutions, if you have these bursts of data they ultimately can scale up and they are more native.

How are customer service and technical support?

Technical support has been pretty good. We don't really have anything to complain about. We're satisfied with the service so far.

Which solution did I use previously and why did I switch?

For this particular category, due to the fact that we're testing all the other tools and they were too much of what we needed and due to the fact that we have used other products in other projects, and nothing really worked for us. Airflow, being a bit different, we decided that it was a nice player and a good open-source tool. 

We do use other tools. However, this one seems to work quite well for us.

How was the initial setup?

The initial setup isn't as straightforward as we hoped. It's not as flexible as other options. You need to be a bit creative during the process.

What's my experience with pricing, setup cost, and licensing?

This product is open-source.

What other advice do I have?

We're just customers and end-users. We don't have a special business relationship with Apache.

I'm not sure of which version of the solution we're using. It's likely the most up-to-date, or at the very most back two or three versions as we are not using any of the older versions.

I'd advise others considering the solution to first understand what exactly you're trying to achieve. You either select a non-cloud native Apache workflow manager or select something that is way too big for what you are actually trying to achieve. Understand what is exactly what you need and the volumes that you need, and what exactly are the use cases.

After that, in terms of deployment, that depends on what you exactly are trying to do. If all of your solutions are cloud-native, try to do it with a cloud-native tools solution. Specifically, go to the CMCS site and look into the solutions that there. Those have been tested at least for the cloud-native solutions that exist.

Then, just make sure that the components you have will match and will be available to whatever you're trying to build. For example, the user management is something that is important for us and for this specific setup. Probably for some others, it's not going to be. 

Take into consideration, what are the different connection points and make sure that they are either supported or that you can support the integration of such items. You need to have a proper developer that can help you build your connector or your API.

In general, I would rate the solution at a seven out of ten. If they fix the APIs and the price on LTK, I'd rate it closer to a nine.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: November 2021.
552,407 professionals have used our research since 2012.
MW
Assistant Manager at a comms service provider with 10,001+ employees
Real User
Top 10
Comes with direct support for Python, letting us easily automate our pipelines

Pros and Cons

  • "The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot."
  • "We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult."

What is our primary use case?

There are a few use cases we have for Apache Airflow, one being government projects where we perform data operations on a monthly basis. For example, we'll collect data from various agencies, harmonize the data, and then produce a dashboard. In general, it's a BI use case, but focusing on social economy.

We concentrate mainly on BI, and because my team members have strong technical backgrounds we often fall back to using open source tools like Airflow and our own coded solutions. 

For a single project, we will typically have three of us working on Airflow at a time. This includes two data engineers and a system administrator. Our infrastructure model is hybrid, based both in the cloud and on-premises. 

What is most valuable?

The best part of Airflow is its direct support for Python, especially because Python is so important for data science, engineering, and design. This makes the programmatic aspect of our work easy for us, and it means we can automate a lot.

It's such a natural fit because our engineers are also Python-based, and I think we also quite like that we don't have to learn different kinds of UIs. Airflow is based on standard software packages, so we don't have to learn anything new in the way of opinionated UIs from different vendors.

What needs improvement?

We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult.

When something fails, it's not that easy to troubleshoot what went wrong. Sometimes the UI becomes really slow and there's no easy way to diagnose the problem. For the most part, we have had to learn through trial and error how to operate it properly. 

The UI is also not that attractive, and I feel that the user experience isn't that nice. Version 2 is supposedly better, but without having tried it, I could suggest more improvements in the visual UI. We want to do the ETL as code, but having a nice visual UI to facilitate this process would be great. Because that means we can also rely on non-technical staff, rather than just the three solid technical staff we have here. If there were better features for the UI, like drag-and-drop, then we could expand its use to more of our team.

For how long have I used the solution?

I've been using Apache Airflow for about two and a half years. 

What do I think about the stability of the solution?

I think how Apache Airflow works is great. We like the paradigm of ETL as code, which means you define your pipeline as code. All the while, people talk about infrastructure as code, so the practice of ETL as code really fits into that philosophy.

What do I think about the scalability of the solution?

We can scale it well, and it runs on cloud, too. It's compatible with cloud-native technologies like Kubernetes so it has no issues regarding elasticity.

How are customer service and technical support?

We contacted an Airflow developer for assistance once and it was a good experience.

Which solution did I use previously and why did I switch?

We like to explore different tools, mixing and matching them to our needs, but we have never really found any like Airflow that are to our liking. We tried looking into Talend and Alteryx but we didn't find them suitable to our style or approach.

How was the initial setup?

As a first-time user, it was complex and somewhat difficult to set up as there are many components to put together. You've got your data portion, your scheduler portion, your web server portion, etc., and you've got all these parts to set up at first.

The next project that you get to, it gets easier. You really need to acquire a feel for what you're doing, and once you get over that, it's not too bad.

What about the implementation team?

We implemented Airflow ourselves, with the help of our two in-house data engineers and system administrator. It took around three months to get it deployed initially, from concept into production. Then after that, the goal is just to operate it and keep it running.

What's my experience with pricing, setup cost, and licensing?

Although Airflow is open source software, there's also commercial support for it by Astronomer. We personally don't use the commercial support, but it's always an option if you don't mind the extra cost.

What other advice do I have?

I can recommend Apache Airflow, especially if there are serious data engineers on your team. If, on the other hand, you're looking to enable business users, then it's not suitable.

I would rate Apache Airflow an eight out of ten.

Which deployment model are you using for this solution?

Hybrid Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Other
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
JR
Senior Software Engineer at a pharma/biotech company with 1,001-5,000 employees
Real User
Top 10
Feature rich, open-source, and good for building data pipelines

Pros and Cons

  • "I like the UI rework, it's much easier."
  • "I would like to see it more friendly for other use cases."

What is our primary use case?

I'm a data engineer. In the past, I used Airflow for building data pipelines and to populate data warehouses. With my current company, it's a data product or datasets that we sell to biopharma companies.

We are using those pipelines to generate those datasets.

What is most valuable?

I like the UI rework, it's much easier.

I use XCom for derived variables that need to pass between tasks. I don't really tend to use it for passing data, but only for a derived variable. For example, I don't have to re-query something every time, with one-task uses. I use the JSON comp for overwriting certain parameters.

In our use cases, some of the inputs of the dataset are files that we pulled out of S3. Sometimes they need to re-do those files, but we don't need to change any logic, we just need to redo the bills. Rather than redeploying the code to point to a new S3 bucket, we overwrite it to point to a different S3 key.

I have read that there are many different workflow pipelining tools in the biotech space, such as Snakemake and Nextflow.

There is also a CWL plugin that we may look into at some point. 

Eventually, we might have a use case where a researcher has a pipeline they run locally, and then we want to convert that to a DAG. 

The CWL-Airflow plugin would be useful for that. This might be something to look into later. But that would be like months, or maybe a year from now.

What needs improvement?

I am using a Celery Executor and I find that it crashes and I can't see any logs. I can only assume that it's a memory issue and have to blindly restart until eventually, it starts up again.

One of the use cases is triggered by input rather than a batch process. For example, we receive a batch of data, it goes through tasks one, two, and three, and a new batch comes in, each subsequent task should be operating on just that data from the prior task.

I am used to working on it as the output gets written to a table and then the next task selects all from that upstream table. It could be coded where you are only writing the data for that portion of the task. It could handle state machines and state changes as opposed to the batch proxy.

I would like to see it more friendly for other use cases.

For how long have I used the solution?

In my current company, I just introduced it within the last couple of months. But I've used it at my prior two jobs as well.

We are using Version 2.0.1.

What's my experience with pricing, setup cost, and licensing?

We are using the open-source version of Apache Airflow.

What other advice do I have?

I usually create my own custom operators every time. We upgraded to 2.0, but I am not using any of the new features. 

I haven't yet used DAG of DAGs or the new way of using Python functions in the Python operator yet. But we might use DAG of DAGs eventually.

I Love this solution and I would rate it a nine out of ten.

Which deployment model are you using for this solution?

Private Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
AJ
Associate Director - Technologies at a tech services company with 51-200 employees
Real User
Top 20
Quick and easy to set up, but the technical support needs to be improved

What is our primary use case?

Our primary use case is to integrate with SLAs.

What is most valuable?

The most valuable feature is the workflow.

What needs improvement?

Technical support is an area that needs improvement. The contact numbers should be readily available so that we can call to get support as required. In the future, I would like to see a single-click installation.

For how long have I used the solution?

We have been working with Apache Airflow for approximately one month.

What do I think about the scalability of the solution?

In our company, we are doing a POC and there are only three users. We have also implemented it for clients. We do plan to increase our usage and the POC that we are now working on is something that we will implement for other clients…

What is our primary use case?

Our primary use case is to integrate with SLAs.

What is most valuable?

The most valuable feature is the workflow.

What needs improvement?

Technical support is an area that needs improvement. The contact numbers should be readily available so that we can call to get support as required.

In the future, I would like to see a single-click installation.

For how long have I used the solution?

We have been working with Apache Airflow for approximately one month.

What do I think about the scalability of the solution?

In our company, we are doing a POC and there are only three users. We have also implemented it for clients.

We do plan to increase our usage and the POC that we are now working on is something that we will implement for other clients if it works.

How are customer service and technical support?

We are not satisfied with technical support. We rely on using Google to identify solutions for the problems we have.

Which solution did I use previously and why did I switch?

We did not use another similar solution prior to Airflow.

How was the initial setup?

The initial setup was straightforward and it does not take long to complete. The deployment took no more than an hour.

Which other solutions did I evaluate?

We evaluated Control-M and another similar product from IBM.

What other advice do I have?

This is a good product and I definitely recommend it.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
CP
Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees
Real User
Top 10
Scalable, stable and simple installation

Pros and Cons

  • "We have been quite satisfied with the stability of the solution."
  • "The dashboard is connected into the BPM flow that could be improved."

What is our primary use case?

We mainly used the solution in banking, finance, and insurance. We are looking for some opportunities in production companies, but this is only at the very early stages.

What is most valuable?

I do not have specific feedback because it is quite early in the review stage for comment.

What needs improvement?

The dashboard is connected into the BPM flow that could be improved.

For how long have I used the solution?

I have been using the solution for half a year.

What do I think about the stability of the solution?

We have been quite satisfied with the stability of the solution.

What do I think about the scalability of the solution?

The scalability of the solution is good.

How are customer service and technical support?

We had no issue with technical support.

How was the initial setup?

The installation is straightforward.

What's my experience with pricing, setup cost, and licensing?

The pricing for the product is reasonable.

Which other solutions did I evaluate?

We are evaluating Camunda as well as this solution. We are investigating and trying to determine how suitable they are for production facilities. Additionally, we are seeing where the solutions are actually suitable in what type of processes.

What other advice do I have?


We are unsure of which solution we will end up with, we are testing them currently. We are trying to get into new business types and new industries. We are looking into how well the solutions can be used in production facilities.

I rate Apache Airflow an eight out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Ariful Mondal
Consulting Practice Partner - Data, Analytics & Artificial Intelligence at Wipro Ltd
Real User
ExpertModerator
Managing large scale Data Pipeline and Python tasks have been made easy
We have been using Apache Airflow for the past 2 years for various use cases such as:  Data Pipeline building and monitoring Automation of data extraction processes and Intelligent Automation Web Scraping at scale for financial services  We manage large-scale data processing workloads using DAG (Directed Acyclic Graph), which is a core concept of Airflow (Apache Airflow is commonly known as Airflow) expediting error handling and logging. It helped us to manage the complex workflows and orchestration of tasks efficiently. I found the following features very useful: DAG - Workload management and orchestration of tasks using  TaskFlow API - moving Python tasks have been made easy, cleaning of DAGs using @task decorator in python Connection and Hooks - interface to connect…

We have been using Apache Airflow for the past 2 years for various use cases such as: 

  • Data Pipeline building and monitoring
  • Automation of data extraction processes and Intelligent Automation
  • Web Scraping at scale for financial services 

We manage large-scale data processing workloads using DAG (Directed Acyclic Graph), which is a core concept of Airflow (Apache Airflow is commonly known as Airflow) expediting error handling and logging. It helped us to manage the complex workflows and orchestration of tasks efficiently.

I found the following features very useful:

  • DAG - Workload management and orchestration of tasks using 
  • TaskFlow API - moving Python tasks have been made easy, cleaning of DAGs using @task decorator in python
  • Connection and Hooks - interface to connect external systems

To be able to implement various useful functionalities of Airflow effectively you would need to be a very good python programmer. UI can be improved with additional user-friendly features for non-programmers and for fewer coding practitioner requirements.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Buyer's Guide
Download our free Apache Airflow Report and get advice and tips from experienced pros sharing their opinions.