What is our primary use case?
Services that need real-time and fast updates as well as lot of data to process, flink is the way to go. Apache Flink with kubernetes is a good combination. Lots of data transformation grouping, keying, state mangements are some of the features of Flink. My use case is to provide faster and latest data as soon as possible in real time.
How has it helped my organization?
The main advantage is the turnaround time, which has been reduced drastically because of Apache Flink. Earlier, it used to take a lot of processing time but now things have changed, and everything is in almost real time. We get the latest data in a very less time. There is no waiting or lag of data in the application, time has been one of the important factors.
The other factor is memory. The utilization of the machine has been more efficient since we started to use this solution. The big data applications definitely use a large group of machines to process the data. These machines are not optimally utilized. Some of the machines might not have been required, but they still take hold of the resources. In Kubernetes, we can provide resources, and in Apache Flink, there is a configuration where you can do the deployment in combination with a single cluster node. Scalability is quite flexible in flink with task managers and resource configuration.
What is most valuable?
MapFunction, FilterFunction are the most useful or the most used function in Flink. Data transformation becomes easy for example, Applications that store information about people and when they want to retrieve those person's details in some kind of relation, in the map function, we can actually filter all persons based on their age group. That's why the mapping function is very useful. This could be helpful in analytics to target specific news to specific age group.
What needs improvement?
TimeWindow feature. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks.
Watermark is basically associating data in the stream with a timestamp. Documentation can be referred. They have updated rest of the documentaion but not the testing documentation. Therefore, We have to manually try and understand few concepts.
Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is expected before integrating. Consider a use case where you want to actually analyze or get analytics about how much data you have processed and how many failed? Prometheus is one of the common metric tools out of the box supported by flink, along with other metric services. The documentation is straight forward. There is a learning curve with metric services, which can consume a lot of time, if not well versed with those tools.
Failure handling basic documentation is provided by flink, like restart on task failure, fixed delay restart...etc.
For how long have I used the solution?
I have been using Apache Flink for almost nine months.
What do I think about the stability of the solution?
The uptime of our services has increased, resources are better utilized, restarts is automated on failures, alerts are triggered when infrastructure breaches threshold and application failure metrics are logged, due to which the application has become robust, scaling is something which can be tweaked as per usage of application, business rules. Also Flink parameters are configurable, altogether it made our application more stable and maintainable.
What do I think about the scalability of the solution?
I haven't actually hit a lot of performance or stress test to see the scaling.
There is no detailed documentation for scaling. There is no fixed solution as well, it depends on use cases, there are rest APIs to scale your task dynamically in Flink Documentation, haven't personally tried it yet.
How are customer service and technical support?
I haven't actually used their technical support yet.
Which solution did I use previously and why did I switch?
I have tried batch processing, It was not that much effective in my use case. It was time consuming and not real time solution.
How was the initial setup?
The initial setup is straightforward. A new joined person (with some experience in software industry) would not find it that much difficult to understand and then contribute to the application. When you want to start writing the code for it, then things get tricky and more complex because if you are not involved with what Apache Flink is providing and what you need to do, it is very difficult. If followed the documentation, there are various examples provided and different deployment strategies as well.
What was our ROI?
It is good solution for time and cost saving.
What's my experience with pricing, setup cost, and licensing?
Being open source licensed, cost is not a factor. The community is strong to support.
What other advice do I have?
To get your hands wet on streaming or big data processing applications, to understand the basic concepts of big data processing and how complex analytics or complications can be made simple. For eg: If you want to analyze tweets or patterns, its a simple use case where you just use flink-twitter-connector and provide that as your input source to Flink. The stream of random tweets keeps on coming and then you can apply your own grouping, keying, filtering logic to understand their concepts.
An important thing I learned while using flink, is basic concepts of windowing, transformation, Data Stream API should be clear, or atleast be aware of what is going to be used in your application, or else you might end up increasing the time rather than decreasing. You should also understand your data, process, pipeline, flow, Is flink the right candidate for your architecture or an over kill?
It is flexible and powerful is all I can say.
Which version of this solution are you currently using?