Apache Flink Room for Improvement

Ilya Afanasyev - PeerSpot reviewer
Senior Software Development Engineer at Yahoo!

The solution could be more user-friendly. The debugging system could be more suitable in the new release.

View full review »
PrashantVaghela - PeerSpot reviewer
Principal Engineer at InnovAccer Inc.

One of the ways to interact with Flink is through a tool called PipeLINK for writing Flink code, and it doesn't require you to use Python directly.

While it does offer a Python-like syntax called PyFlink. PyFlink is a subset of Python that is specifically designed for writing Flink code. It provides a simpler and more accessible way to write Flink code compared to using the Java or Scala APIs.

PyFlink is not as fully featured as Python itself, so there are some limitations to what you can do with it. So, this is an area for improvement. 

However, it is a good choice for users who are not familiar with Java or Scala.

View full review »
Armando Becerril - PeerSpot reviewer
Partner / Head of Data & Analytics at Kueski

One way to improve Flink would be to enhance integration between different ecosystems. For example, there could be more integration with other big data vendors and platforms similar in scope to how Apache Flink works with Cloudera. Apache Flink is a part of the same ecosystem as Cloudera, and for batch processing it's actually very useful but for real-time processing there could be more development with regards to the big data capabilities amongst the various ecosystems out there.

I am also looking for more possibilities in terms of what can be implemented in containers and not in Kubernetes. I think our architecture would work really great with more options available to us in this sense.

Finally, it's a challenge to find people with the appropriate skills for using Flink. There are a lot of people who know what should be done better in big data systems, but there are still very few people with Flink capabilities.

View full review »
Buyer's Guide
Apache Flink
April 2024
Learn what your peers think about Apache Flink. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.
RA
Sr. Software Engineer at a tech services company with 10,001+ employees

In Flink, maintaining the infrastructure is not easy. You have to design the architecture well. If you want to scale for a larger number of streaming data you need good machines. You need good resilience architecture so that if it fails, you can recover from those with minimum downtime. You should have good storage systems to store and retrieve intermediate flink states(in case of stateful applications). Basically all the problems that come with a distribution system. So you have to have all that infrastructure for it to perform well. Best way is to look at the use cases you wish to support in 5-10 years ahead and design the architecture around flink accordingly.

View full review »
Sunil  Morya - PeerSpot reviewer
Consultant at a tech vendor with 10,001+ employees

The issue we had with Flink was that when you had to refer the schema into the input data stream, it had to be done directly into code. The XLS format where the schema is stored, had to be stored in Python. If the schema changes, you have to redeploy Flink because the basic tasks and jobs are already running. That's one disadvantage. Another was a restriction with Amazon's CloudFormation templates which don't allow for direct deployment in the private subnet. You have to deploy into the public subnet and then from the Amazon console, specify a different private subnet that requires a lot of settings. In general, the integration with Amazon products was not good and was very time-consuming. I'd like to think that has changed.

View full review »
ZHIZHENG - PeerSpot reviewer
Product Operations Manager at OKX

Apache Flink's documentation should be available in more languages.

View full review »
AC
CTO at ReNew

Apache Flink should improve its data capability and data migration. 

View full review »
MP
Lead Data Scientist at a transportation company with 51-200 employees

There is room for improvement in the initial setup process. I found myself spending a significant amount of time navigating through documentation to configure it. It would be beneficial to have streamlined commands, where the environment could be quickly initialized, including database setup, providing a more efficient and convenient starting point for projects. This would be particularly advantageous for development and experimentation, allowing more focus on feature testing rather than spending time on the setup process.

View full review »
JR
Sr Software Engineer at a tech vendor with 10,001+ employees

We have a machine learning team that works with Python, but Apache Flink does not have full support for the language. We needed to use Java to implement some of our job posting pipelines.

View full review »
BH
Lead Software Engineer at a tech services company with 5,001-10,000 employees

TimeWindow feature. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks.

Watermark is basically associating data in the stream with a timestamp. Documentation can be referred. They have updated rest of the documentaion but not the testing documentation. Therefore, We have to manually try and understand few concepts.  

Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is expected before integrating. Consider a use case where you want to actually analyze or get analytics about how much data you have processed and how many failed? Prometheus is one of the common metric tools out of the box supported by flink, along with other metric services. The documentation is straight forward. There is a learning curve with metric services, which can consume a lot of time, if not well versed with those tools.

Failure handling basic documentation is provided by flink, like restart on task failure, fixed delay restart...etc.

View full review »
SD
Software Architect at a tech vendor with 501-1,000 employees

The state maintains checkpoints and they use RocksDB or S3. They are good but sometimes the performance is affected when you use RocksDB for checkpointing.

We can write python bolts/applications inside Apache Storm Code and it supports Python as a programming language, but with Flink, the Python support is not that great. When we do machine learning, data science, or ML work, we want to integrate the data science or machine learning pipeline with our real-time pipeline and most of the data science or machine learning work is in Python.

It was very easy with Storm. Storm supports native Python language, so integration was easy. But Flink is mostly Java. The integration of Python with Java is difficult, so it's not direct integration. We need to find an alternative way. We created an API layer in between so the Java and Python layers were communicating by using an API. We just called data science models or ML models using the API which runs in Python while Flink runs in Java. We would like to see improvement where we can have another way to run it. Currently, it's there, but it's not that great. This is an area that we would like to see improvement. 

View full review »
VI
Principal Software Engineer at a tech services company with 1,001-5,000 employees

In terms of improvement, there should be better reporting. You can integrate with reporting solutions but Flink doesn't offer it themselves. 

They're more about the processing side. Low latency processing is out of their scope. As ar as low latency is concerned, you can integrate to other backend solutions as well. They have that flexibility. APIs are good enough. Its in-memory is so fast, you could have faster-developed data and stuff like that.

View full review »
JV
Head of Data Science at a energy/utilities company with 10,001+ employees

I am using the Python API and I have found the solution to be underdeveloped compared to others. There needs to be better integration with notebooks to allow for more practical development. Additionally, there are no managed services. For example, on Azure, you would have to set everything up yourself.

In a future release, they could improve on making the error descriptions more clear.

View full review »
RP
Software Development Engineer III at a tech services company with 5,001-10,000 employees

Flink has become a lot more stable but the machine learning library is still not very flexible. There are some models which are not able to plug and play. In order to use some of the libraries and models, I need to have a Python library because there might be some pre-processing or post-processing requirements, or to even parse and use the models. The lack of Python support is something they can maybe work on in the future. 

View full review »
Ertugrul Akbas - PeerSpot reviewer
Manager at ANET

There is a learning curve. It takes time to learn.

The initial setup is complex, it could be simplified.

View full review »
Buyer's Guide
Apache Flink
April 2024
Learn what your peers think about Apache Flink. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
768,578 professionals have used our research since 2012.