Apache Flink Room for Improvement

-Rahul Agarwal
Sr. Software Engineer at a tech services company with 10,001+ employees
In Flink, maintaining the infrastructure is not really easy. You have to design the function very well. If you want to scale for a larger number of outputs you need good machines. You need good resilience architecture so that if it fails, you have good storage systems. Basically all the problems that come with a distribution system. So you have to have all that infrastructure for it to perform well. View full review »
Sandesh Deshmane
Solution Architect at a tech vendor with 501-1,000 employees
The state maintains checkpoints and they use RocksDB or S3. They are good but sometimes the performance is affected when you use RocksDB for checkpointing. We can write python bolts/applications inside Apache Storm Code and it supports Python as a programming language, but with Flink, the Python support is not that great. When we do machine learning, data science, or ML work, we want to integrate the data science or machine learning pipeline with our real-time pipeline and most of the data science or machine learning work is in Python. It was very easy with Storm. Storm supports native Python language, so integration was easy. But Flink is mostly Java. The integration of Python with Java is difficult, so it's not direct integration. We need to find an alternative way. We created an API layer in between so the Java and Python layers were communicating by using an API. We just called data science models or ML models using the API which runs in Python while Flink runs in Java. We would like to see improvement where we can have another way to run it. Currently, it's there, but it's not that great. This is an area that we would like to see improvement. View full review »
Jyala Rahul Jyala
Sr Software Engineer at a tech vendor with 10,001+ employees
We have a machine learning team that works with Python, but Apache Flink does not have full support for the language. We needed to use Java to implement some of our job posting pipelines. View full review »
Find out what your peers are saying about Apache, Amazon, VMware and others in Streaming Analytics. Updated: October 2020.
441,478 professionals have used our research since 2012.
Hitesh Baid
Senior Software Engineer at a tech services company with 5,001-10,000 employees
TimeWindow feature. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks. Watermark is basically associating data in the stream with a timestamp. Documentation can be referred. They have updated rest of the documentaion but not the testing documentation. Therefore, We have to manually try and understand few concepts. Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is expected before integrating. Consider a use case where you want to actually analyze or get analytics about how much data you have processed and how many failed? Prometheus is one of the common metric tools out of the box supported by flink, along with other metric services. The documentation is straight forward. There is a learning curve with metric services, which can consume a lot of time, if not well versed with those tools. Failure handling basic documentation is provided by flink, like restart on task failure, fixed delay restart...etc. View full review »
Find out what your peers are saying about Apache, Amazon, VMware and others in Streaming Analytics. Updated: October 2020.
441,478 professionals have used our research since 2012.