Apache Airflow Room for Improvement

SUDHIR KUMAR RATHLAVATH - PeerSpot reviewer
Student at University of South Florida

One improvement could be the inclusion of a plugin with a drag-and-drop feature. This graphical feature would be beneficial when dealing with connectivity and integration services like connecting to BigQuery or other systems. As a first-time user, although the documentation is available, it would be more user-friendly to have a drag-and-drop interface within the portal. Users could simply drag and drop components to create a pseudo-code, making it more flexible and intuitive.

Therefore, I suggest having a drag-and-drop feature for a more user-friendly experience and better code management.  

Moreover, for admins, there should be improved logging capabilities because Apache Airflow does have logging, but it's limited to some database data. It would be better if everything goes into the server where it's hosted. Probably on the interface level. If something goes well for the developers.

View full review »
FB
Product Owner at La Poste S.A.
The automation capabilities could be improved; a visual workflow designer and a graphical tool to reduce coding would be very helpful. But for now, it's sufficient for our simple workflows. View full review »
Damian Bukowski - PeerSpot reviewer
Program Python at Santander Bank Polska

The only thing I would like Apache to do is to introduce an integration of the database from Oracle because it currently supports Postgres primarily in MySQL. Oracle is something that many companies use, like a production database, for which you have to pay since it is not free and offers more extended support. With Apache Airflow, even though it uses Python and Python has modules that include Oracle databases, it'll be safer and more convenient to do it through Apache Airflow and not through Python scripts. I want to see Apache Airflow have more integrations with more production-based databases since it is an area where the product lacks currently.

View full review »
Buyer's Guide
Apache Airflow
April 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,995 professionals have used our research since 2012.
MW
Analytics Solution Manager at Telekom Malaysia

There is an area for improvement in onboarding new people. They should make it simple for newcomers. Else, we have to put a senior engineer to operate it.

View full review »
Miodrag Milojevic - PeerSpot reviewer
Senior Data Archirect at Yettel

The current pricing of Apache Airflow is considerably higher than anticipated, catching us off guard as it has evolved from its initial pricing structure. It would be beneficial to improve the pricing structure. Also, enhancing the interface furthermore would be highly beneficial.

View full review »
Ravan Nannapaneni - PeerSpot reviewer
Senior Lead Engineer at Oliver Wyman

We have faced scenarios where Apache Airflow becomes non-responsive, leading to job failures. To resolve such situations, we had to manually reboot Apache Airflow since it doesn't provide an option to restart within the application. This necessitated modifying some configurations to initiate a restart of all Apache Airflow components. Although Apache Airflow is generally dependable, it may occasionally encounter glitches that can disrupt production flows and batches.

View full review »
Punit_Shah - PeerSpot reviewer
Director at Smart Analytica

Enhancements become necessary when scaling it up from a few thousand workflows to a more extensive scale of five thousand or ten thousand workflows. At this point, resource management and threading, become critical aspects. This involves optimizing the utilization of resources and threading within the Kubernetes VM ecosystem.

View full review »
UjjwalGupta - PeerSpot reviewer
Module Lead at Mphasis

The solution lacks certain features. We cannot run real-time jobs in the solution. It supports only batch jobs. If we are using ETL pipelines, it can either be a batch job or a real-time job. Real-time jobs run continuously. They are not scheduled. Apache Airflow is for scheduled jobs, not real-time jobs. It would be a good improvement if the solution could run real-time jobs. Many connectors are available in the product, but some are still missing. We have to build a custom connector if it is not available. The solution must have more in-built connectors for improved functionality.

View full review »
SabinaZeynalova - PeerSpot reviewer
Data Engineer Team Lead at Unibank

Airflow is a pipeline for transferring code by clients, but for experimental model experiments, Apache Airflow does not have any solution. There is a need for more features on experimental evolution steps.

View full review »
ManojKumar43 - PeerSpot reviewer
Big Data Engineer at BigTapp Analytics

Airflow should support the dynamic drag creation.

View full review »
PA
Senior Data Engineer at a photography company with 11-50 employees

The problem with Apache Airflow is that it is an open-source tool. You have to build it into a Kubernetes container, which is not easy to maintain, and I find it to be very clunky.

Additionally, there is room for improvement with DAGs. I had a very hard time building DAGs in Apache Airflow. I decided to use Astronomer, which is on top of Apache Airflow and is supposed to make your life easier. The best part of the solution is the third-party add-on which is Astronomer.

It would be a very nice tool if it could have been an entirely cloud-based solution. Apache Airflow is not so nice when you have a hybrid setup, such as half is on-premises and half of it is on a cloud environment. It should integrate better with the outside world.

View full review »
Pravin Gadekar - PeerSpot reviewer
Google Cloud Architect at Capgemini

The platform's stability needs improvement, particularly regarding occasional interruptions due to networking issues. It requires manual intervention to resume jobs. Additionally, while extending the code is possible, it sometimes necessitates creating custom plugins.

View full review »
Mikalai Surta - PeerSpot reviewer
Head of Big Data Department at IBA Group

Apache Airflow should have better integration with cloud platforms.

View full review »
AS
Associate Data Engineer at a outsourcing company with 201-500 employees

Programmatically, it's very good, and it doesn't have any competitors, but you cannot develop anything in Airflow UI. You need to develop everything within the program. In the market, other tools have come up recently as competitors to Airflow, and they also give graphical programming options, whereas Airflow doesn't provide that feature currently. All the DAGs you want to build need to be coded in Python. It doesn't provide features for graphical programming. You cannot drag and drop something, build a pipeline out of that, or orchestrate that with a drag and drop. They have a graphical feature but only for administration purposes, not for development. They don't have a UI for development.

It doesn't support the Windows system. That's a big drawback because a lot of people are using Windows. 

View full review »
Nomena NY HOAVY - PeerSpot reviewer
Lead Data Scientist at MVola

Apache Airflow could be improved by integrating some versioning principles. Currently, we have to swap some tags in our flow. It would be interesting if we can check the product and version all of the product at the same time comparing what scripts have changed from last year to this year, or last month to this month.

For example, we have a flow for one project, to version it we need to check it one by one to identify which tags changed and which scripts changed. All of these need to be done manually.

View full review »
AT
Lead of Monitoring Tech at a educational organization with 1,001-5,000 employees

Adding more automated components in Apache Airflow for basic things like exporting the data would be helpful. Apache Airflow is not that easy to use, but we have gotten used to it.

View full review »
VenugopalKathirvel - PeerSpot reviewer
Senior Member Of Technical Staff, Engineering Operations at VMware

Apache Airflow could be improved with the addition of more frameworks.

View full review »
MW
Analytics Solution Manager at Telekom Malaysia

We're currently using version 1.10, but I understand that there's a lot of improvements in version 2. In the earlier version that we're using, we sometimes have problems with maintenance complexity. Actually using Airflow is okay, but maintaining it has been difficult.

When something fails, it's not that easy to troubleshoot what went wrong. Sometimes the UI becomes really slow and there's no easy way to diagnose the problem. For the most part, we have had to learn through trial and error how to operate it properly. 

The UI is also not that attractive, and I feel that the user experience isn't that nice. Version 2 is supposedly better, but without having tried it, I could suggest more improvements in the visual UI. We want to do the ETL as code, but having a nice visual UI to facilitate this process would be great. Because that means we can also rely on non-technical staff, rather than just the three solid technical staff we have here. If there were better features for the UI, like drag-and-drop, then we could expand its use to more of our team.

View full review »
Fadi Bathish - PeerSpot reviewer
Project Manager at Siren Analytics

The following should be improved:

  • Dashboards
  • Security
  • Telemetry for logging, monitoring, and alerting purposes
  • Documentation 
View full review »
JR
Senior Software Engineer at a pharma/biotech company with 1,001-5,000 employees

I am using a Celery Executor and I find that it crashes and I can't see any logs. I can only assume that it's a memory issue and have to blindly restart until eventually, it starts up again.

One of the use cases is triggered by input rather than a batch process. For example, we receive a batch of data, it goes through tasks one, two, and three, and a new batch comes in, each subsequent task should be operating on just that data from the prior task.

I am used to working on it as the output gets written to a table and then the next task selects all from that upstream table. It could be coded where you are only writing the data for that portion of the task. It could handle state machines and state changes as opposed to the batch proxy.

I would like to see it more friendly for other use cases.

View full review »
Luiz Cesar Gosi - PeerSpot reviewer
Senior Analytics Engineer at TalkDesk

I have some issues with the solution's communication. The solution uses the same database or data set. Sometimes, we consume the same data and send it to a different place when doing a different DAG. When using the UI, I want to see that we use the same data set more than once.

View full review »
YS
Software engineer at Naver Corp

The documents do not precisely define the function of the operators. I had to do some experiments to understand the function of the operators. The documentation must be improved. Some parts of the documentation do not precisely explain the parameters and functions. We often need to do experiments to understand how they work.

View full review »
Mahendra Prajapati - PeerSpot reviewer
Senior Data Analytics at a media company with 1,001-5,000 employees

The solution could be improved by simplifying the integration process and providing access to its support team to guide integration.

View full review »
Anandhavelu Arumugam - PeerSpot reviewer
Technical Lead at a media company with 5,001-10,000 employees

Everything is in the Python framework now. I would like to see some no-code capabilities and drag and drop abilities in Airflow.

We're expecting a few more improvements in the log generator. Currently, it's very clumsy.

View full review »
Joaquin Marques - PeerSpot reviewer
CEO - Founder / Principal Data Scientist / Principal AI Architect at Kanayma LLC

The solution can be improved by creating a tool that allows us to do these kinds of things graphically instead of just writing scripts. Hence, the graphical user interface can be improved.

View full review »
SG
Engineering Manager - OTT Platform at Amagi

One specific feature that is missing from Airflow is that the steps of your workflow are not pipelined, meaning the stageless steps of any workflow. Not every workflow can be implemented within Airflow. For example, Step 1 of my workflow will have output which I definitely want to automatically be provided as an input to my Step 2. At the workflow level, we want to have common state management where, across steps, we'll be able to reach the state information. Right now, we're using an external state repository to maintain the state.

If Airflow could come up with some kind of implementation, where not every step of the pipeline is an independent step, that would be helpful. I would like it if a part of the output of your previous steps could be Apache input for your next step. That kind of pipeline is missing. When we consider other products like jBPM, Camunda, or Cadence, they have the concept of pipelining.

I would also like to see support for more platforms, in terms of programming BPMs. Cadence supports Golang and Java. Legacy components can be from any platform, so if they could provide more client support for Java client library and Golang, that would be helpful. I want it to program in Java.

View full review »
JP
Senior Solutions Architect/ Software Architect at a comms service provider with 51-200 employees

The graphics in the past have not been ideal.

We have several areas where we feel they could improve in terms of being a little bit more flexible. One is implementation. Even though we customized it, there were some specific things we had to do with the image by itself.

The management integration was challenging as well. It requires a lot of work on our end. We were creating our own way to integrate things specifically with specific tools. There's not really an ease of management out-of-the-box option for integration. We needed to become a little bit creative to solve that ourselves.

The scalability of the solution itself is not as we expected. Being on the cloud, it should be easy to scale, however, it's not.

There is no SDC versioning. There's no virtual control for pipelines. We have to build several pipelines for several flows, yet there's not a virtual control to generate them.

There's no Python SDK. We need to generate our own scripts and upload them and put them there. However, there's not a realistic case that we can get connected to them. On top of that, the API sets that are provided are very limited. They are not as rich as others. You cannot do much with them.

View full review »
AN
Solution Architect at EPAM Systems

There are some drawbacks to this solution. The code does not cover all tasks in the data warehouse automation process.  Currently , in production, we have a large installation with a complex workflow that includes hundreds of tasks. Most of them are dispatched by existing engine, but not all.
For example, sometimes we need to create cycles in our workflow but we are not able to, because Airflow supports only Direct Acyclic Graphs ( DAGs )

We need to develop our workflow description and notations because out of the box, Apache Airflow does not provide some features that are needed. It is our understanding that it is limited by design.

We will wait for the latest 2.0 version, as it is awaited to be much more mature than the 1.8-1.10 version. We believe that it will be better.
There should be some improvement made to the Doc Management features from within the UI. They should think about Outlook integration, which should be out of the box, and the object model should be expanded to support cyclic graphs inside the workflow. View full review »
AJ
Associate Director - Technologies at a tech services company with 51-200 employees

Technical support is an area that needs improvement. The contact numbers should be readily available so that we can call to get support as required.

In the future, I would like to see a single-click installation.

View full review »
CP
Virksomhedskonsulent - Digitalisering, Forretningsudvikling, BPM, Teknologi & Innovation at a consultancy with 51-200 employees

The dashboard is connected into the BPM flow that could be improved.

View full review »
Buyer's Guide
Apache Airflow
April 2024
Learn what your peers think about Apache Airflow. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,995 professionals have used our research since 2012.