We just raised a $30M Series A: Read our story

Apache Flink OverviewUNIXBusinessApplication

Apache Flink is #4 ranked solution in Streaming Analytics tools. IT Central Station users give Apache Flink an average rating of 8 out of 10. Apache Flink is most commonly compared to Amazon Kinesis: Apache Flink vs Amazon Kinesis. The top industry researching this solution is Computer Software Company, accounting for 27% of all views.
What is Apache Flink?

Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale.

Apache Flink is also known as Flink.

Apache Flink Buyer's Guide

Download the Apache Flink Buyer's Guide including reviews and more. Updated: October 2021

Apache Flink Customers
LogRhythm, Inc., Inter-American Development Bank, Scientific Technologies Corporation, LotLinx, Inc., Benevity, Inc.
Apache Flink Video

Pricing Advice

What users are saying about Apache Flink pricing:
  • "This is an open-source platform that can be used free of charge."
  • "Apache Flink is open source so we pay no licensing for the use of the software."
  • "It's an open-source solution."

Apache Flink Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
-Rahul Agarwal
Sr. Software Engineer at a tech services company with 10,001+ employees
Real User
Top 10
Scalable framework for stateful streaming aggregations

Pros and Cons

  • "Another feature is how Flink handles its radiuses. It has something called the checkpointing concept. You're dealing with billions and billions of requests, so your system is going to fail in large storage systems. Flink handles this by using the concept of checkpointing and savepointing, where they write the aggregated state into some separate storage. So in case of failure, you can basically recall from that state and come back."
  • "In terms of stability with Flink, it is something that you have to deal with every time. Stability is the number one problem that we have seen with Flink, and it really depends on the kind of problem that you're trying to solve."

What is our primary use case?

Initially, we created our own servers and then eBay created their infrastructure. Now it's deployed on the eBay cloud.

Our primary use case is trying to do real time aggregations/near-real time aggregations. Let's say for example that we are trying to do some count, sum,min,max distinct counts for different metrics that we care about, but we do this in real time. So let's say, you have an e-commerce company and you want to measure different metrics. If I take the example of risk, let's say you want to check if one particular seller on your site is doing something fishy or not. What is the behavior? How many listings do they have? In the past five minutes, one hour or one day or one year? You want to measure this over time.

This data is very important to you from the business metric point of view. Often this data  data is delayed by 1 day via offline analytics. You do ETL for these aggregations ,it's okay for offline business metrics. But when you want to do risk detection for online businesses, it needs to be right away in real time, and that's where those systems fail and where Apache Flink helps. And if combined with Lambda architecture, you can get them real time with the help of a parallel system that captures very latest data.

How has it helped my organization?

The mighty work on risk, right? So we have to provide risk insights. When we have machine learning models or deep learning models or route based systems and when we want to evaluate some user or somebody's behavior on the site, we want to evaluate it right away. If we don't evaluate it right away, it's of no use to us. Let's say that a fraudulent buyer comes in and is trying to buy an item. In the process of buying, when the person clicks on the button and purchases the item, if you're not able to detect the fraud right then, it's of no use to us.

At the same time, we also have to be able to make sure that we are not pissing off the real people if they are good. So we have to create the very hard balance between whether it's a good person or a bad person without seeing what they're doing. Then you need to do it in real time. That's why with offline systems the aggregation of your data comes after one day, which is a typical case of ETL and is of no use to us. We want to do it right away. But at the same time, we don't want to provide too much friction either. This aggregation includes: how many transactions have you done in the past, in the last year? How many transactions did you do in the last minute? How many coupons did you use in last five years or how many coupons did you use in five minutes? All these kinds of metrics can be useful and can feed into a machine learning or deep learning model or a rule based system to do something with it.

Let's say if we feel that the seller has crossed a limit, or our seller is doing something fishy or buying something fishy, we can throw a capture to the user or we can provide some friction -"can you do any multifactor authentication?" These are just examples. So our use case is the idea that we want to have real time aggregations. One thing to note is that Flink will not help you to do real time navigation very well. It helps you in near-real time navigation. But with the lack that you get from near-real time vs. real time is that you can have it from another system. So let's say that Flink is able to give you all the aggregations accurately either daily or in five minutes. I'm just taking an example for our use here. That's how it works. Five minutes. And it depends on the complexity of the problem that you are trying to solve or how much infrastructure you have. So if it is delayed by five minutes, what you can do is have a panel system that only takes care of the latest data. And then you can take this to the system and combine it with the Lambda architecture. Where you combine a historical system with the real time system and then give the data aggregation. What we do is we apply the same concept in a near-real time integrated fashion and then we are able to give real time analytics on anything that happens on us.

What is most valuable?

You need to understand that a bunch of links work on streaming data. Among the streaming data, out of the nuances of streaming, we have called exactly once are the ones we call semantics. Exactly once being the hardest, which is being very well maintained by flex. What I mean by that is, when an event or when data is coming in, you're only processing that event. Only one stop. This is very important when you're trying to do some aggregates.

Let's say you are a seller on the site and we know that depending on when a seller makes a setting limit, you're allowed to sell only three items a day. If we miscount you in the last five minutes, from three to four, but you did three listings, because of the current listing the data does not reflect that, we do not aggregate now. We allowed you to do this. What is wrong? We can't allow a wrong decision on your side based on the setting limits and the reason for that is because the data is out-dated because the data got delayed. That is a reason the data got delayed. The other reasons could be that the data got lost or they will not aggregate exactly the way it was supposed to be aggregated. So exit aggregations are what makes it much harder to do in real time. That's why you use offline systems like Spark and Hadoop. But you can do real time with Symantec, which is very well supported in Flink. So that's one of the best features. 

Another feature is how Flink handles its radiuses. It has something called the checkpointing concept. You're dealing with billions and billions of requests, so your system is going to fail in large storage systems. Flink handles this by using the concept of checkpointing and savepointing, where they write the aggregated state into some separate storage. So in case of failure, you can basically recall from that state and come back.

I'll take an example of Call of Duty. Let's say when you play a game of Call of Duty there are five levels. And in each level, there are different obstacles that you need to clear to advance. There are five levels and in each level, there are different obstacles. So let's say in each level there are 10 obstacles. If you've cleared three obstacles and you die in that process you don't want to start from scratch again in the same level, right? You want to start from the third level. So that is what the concept of checkpointing and savepointing allow you to do. I have done work until this point at the third level, and now I want to restart from the third level only. I don't want to redo that part again. That's what checkpointing and savepointing do. So how does it help in our case? What is the current date? It's Wednesday, October, 15, right? So October 15 until 12:00 PM, we open all the applications. We take that at a regular interval. We take that aggregation snapshot and store it in a different storage. If the system goes down after 12:00PM because the load is high or some other hardware failure, you can recover that data from 12:00 PM and reprocess. You're not recovering the entire thing.

Let's say the seller's count listing was two but you don't want to contrast for that particular seller from zero. You want to count from two after 12:00 PM. Right? That's what Flink helps you do.

Additionally, it helps you scale very well, but there are a lot of nuances. Because Flink allows you to do good aggregations upon segregations, maintaining the system is not that easy because you need a machine that has high RAM requirements. You need to have good memory requirements. It depends on what kind of problems you're solving. The problem that I'm describing is actually the the hardest kind of problem, when you're putting state in Flink's memory, stateful problems. There are other problems called stateless problems. Let's say you are a trucking company and you want to track the new data of your trucks. Let's say you have 15 trucks and you want to look at each of your trucks and each and every truck is spitting out latitude and longitude info when they're moving. In this case, maybe your intention is only to track them real time but you're okay with it being delayed by five, 10 minutes. But in this case, you're not aggregating something. There's nothing stateful about it. You take the data and you dump it to storage. That storage could be anything, plastic surgery or anything. And then you can create a graph and plot a trendline. Where is it going? In a map or something like that. The data comes in and it gets written to a place and then uses a straight graph. So it stays put. In this kind of application, Flink works very well and you can really do a lot of real time analytics on it. But in the case of the problem that I described where you do real time applications, that's a little bit tricky because in this case, you need to recover from the right state so that you don't mess up the relations.

What needs improvement?

In Flink, maintaining the infrastructure is not easy. You have to design the architecture well. If you want to scale for a larger number of streaming data you need good machines. You need good resilience architecture so that if it fails, you can recover from those with minimum downtime. You should have good storage systems to store and retrieve intermediate flink states(in case of stateful applications). Basically all the problems that come with a distribution system. So you have to have all that infrastructure for it to perform well. Best way is to look at the use cases you wish to support in 5-10 years ahead and design the architecture around flink accordingly.

For how long have I used the solution?

I started using Apache Flink in October 2017. My team has been using it since May 2017.

What do I think about the stability of the solution?

In terms of stability with Flink, it is something that you have to deal with every time. Stability is the number one problem that we have seen with Flink, and it really depends on the kind of problem that you're trying to solve. If you're trying to solve the problems that we are trying to solve, which is stateful aggregations, you will find a lot of stability problems. That's why you have to invest money and time into understanding what problem you are trying to solve. How much infrastructure do you need? Stable infrastructure would take time to mature. Once you do that, you also need to spend time understanding and figuring out an optimized way of making it cluster-ready. You don't want to throw money just like that. If you want to throw money, you want to throw money in the right way.

When you create clusters of machines or something like that you're going to need a lot of analysis upfront. Let's say you're selling Flink as a product to different people, how can you do this? One way is you take a bunch of use cases, common use cases, and do experiments with it and form clusters based on that. You can then call them flavors. Let's say, for example, flavor A can do this kind of thing very well, flavor B can deal with these kind of things. And flavor A has its own infrastructure in the sense that flavor A has five job managers, 16 task managers, five zookeepers and all those configurations. Then different clients can use this kind of model. I'm just giving some ideas for how you can make the things work if you are selling this as a product.

What do I think about the scalability of the solution?

In terms of scalability, there's a lot of room for improvement in the case when you're in stateful conditions. There is no system as of now which does scaling very well for stateful aggregations. There are other frameworks like Apache Beam which actually is an app but for other kinds of things. Then there are other things, like Apache Pulsar. But among these, Flink performs best and it's actually very good for streaming architecture.

When we started, we were only three people working on a bunch of things. We were the first people. I was the software engineer. Basically I was the most junior of all. They're mostly principal engineers and I was a software engineer.

When we started we were creating code and generating the chart file. It was worth creating our own clusters. That's why I have that insight. We were doing all the settings on those clusters so that it would work. Then, you wouldn't apply the job on those clusters. We give all these insights to the other team, the platform team, so that they can evangelize the product for the entire company who would want to use Flink. For us it was a side project setting up the clusters and all that, because for us we are solving other business problems, but we had to do it because there was nothing available for us. Then the platform team did all of that. Now we just write code. We understand the business problem. We write code, we generate a plan for them. Everything else is taken care of by them on the machine itself.

It's getting popular really fast now. This idea of when it's easy for people to use it, people will use it if it solves the problem. In my experience, it is going to be big just like Spark. It's just that you need a little bit more infrastructure, because it's trying to solve a complex problem, which is what people need to understand. If it is a complex problem, you have to spend time and energy to make it make it stable. So you need to go to the infrastructure team who can write and who can create a stable Flink protection, then it will help you solve a lot of the problems.

How are customer service and technical support?

I have not had any experience with customer support. We did everything ourselves, but we read a lot of Apache's documentation on that.

How was the initial setup?

The initial setup is not straightforward. It would take time. You have to know a lot of things. But one thing is that when we started, Flink was very new. The product is maturing and people are using it more. They will understand what people need and all that stuff. Maybe it would not be as difficult as it was, but it does require you to understand a lot of things.

So how do you set up a cluster? Let's say you want to do aggregation on 15 million emits for one particular Flink job. In Flink, when you deploy an application, it's called a flincher. When you do that, how do you design the cluster? What boxes do you have? How much RAM do you need? Once you have one particular box, how do you design the topology of the cluster? Let's say the way it works is it has something called job manager, which are the coordinating notes. Then it has the task editor which are the machines which basically do the real work, the aggregation. Then there are the zookeepers. The zookeepers are something which helps you maintain the health of the cluster. You have to make a balance.

How many zookeepers do you want? How many job managers do you want? How many task editors do you want? How much RAM do you want to have in each cluster? How much network do you want to open? How much traffic can the Flink cluster take in the stable manner so that it doesn't go down frequently? You have to do a lot of experiments with all of this setup, depending on what problem you're solving, it depends upon the load that you're getting from your business.

What about the implementation team?

Deployment takes around two minutes if everything is good. It's very fast.

How do we do that? Just like any application, we write code, we build a bit of the manifest on the job, the code that is able to be deployed. You take that bell JAR. It's called a JAR. Generally we have used the JAR version because Flink is a JVM theme, so we are using the JVM version, which supports Java and SQL. We write code for whatever you want to do with it. We build the code and it converts into a JAR file. We take the JAR file and upload it into a Flink server and then you just click a button and it's deployed.

You don't have to do anything. If you are starting up, the moment you install Flink in your local machine, when you're trying it out and you start the server, you will see your Flink server UI coming up. There's an option so that you can deploy a job. What you have to do is build your code, generate a JAR file and then simply go to the UI, upload the JAR and start the server. That's it.

Which other solutions did I evaluate?

Before Apache Flink my team tried Apache Storm and it did not work for them. I think Apache Storm is not being used by anyone else in the company.

What other advice do I have?

This is general advice if you're trying to do anything: Any problem that you're trying to evaluate, you have to really understand the problem that you're trying to solve, what is the nature of the problem? And by nature of the problem, the business side is one thing, but you have to understand how you're solving things. For example, do you want something to be fast enough, scalable and for any new product? Every time they advertise it is fast, scalable, highly distributed, etc... But in what context? What kind of use cases is this product built for? You have to understand the principle and only then you choose a product. If you want Apache Flink, it's about if you want something for near-real time metrics that may be useful for your business.

In that case, Apache Flink is your friend, because it's built on streaming architecture. If the nature of your application or your business is streaming, the data is coming at a very high rate and you want to do something with it, then Apache Flink is a good option. Another example I can give you: let's say you run a company, you are the CEO of Twitter, right? So in Twitter, a lot of people are writing a lot of stuff. A lot of streaming data is coming in. Because a lot of people are tweeting at the same time all around the world there's a lot of streaming of data coming in.

Let's say you're a celebrity and 5,000 people follow you. When you write a tweet, all 5,000 people have to see that tweet as quickly as possible. So when your tweet comes in, a very complex system from Twitter's backend has to take that tweet, has to know which of those people and display it on their feed timeline. Now this might sound easy when you only have five people, but if you have 315 million people tweeting, it's a very complex system and you have to make it available, etc... So when you're dealing with streaming data Apache Flink is a good option.

On a scale of one to ten, I would rate Apache Flink around seven to eight. It's pretty good if you're solving a streaming type of problem. My experience is limited. I only worked with Apache Storm a little bit and Apache Flink. Among all of this, if I would talk about streaming, Apache Flink wins hands down, but there are other products like Apache Pulsar which I have no idea. So my perspective is very limited.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Sandesh Deshmane
Software Architect at a tech vendor with 501-1,000 employees
Real User
Top 5
Provides out-of-the-box checkpointing and state management

Pros and Cons

  • "With Flink, it provides out-of-the-box checkpointing and state management. It helps us in that way. When Storm used to restart, sometimes we would lose messages. With Flink, it provides guaranteed message processing, which helped us. It also helped us with maintenance or restarts."
  • "The state maintains checkpoints and they use RocksDB or S3. They are good but sometimes the performance is affected when you use RocksDB for checkpointing."

What is our primary use case?

We have our own infrastructure on AWS. We deploy Flink on Kubernetes Cluster in AWS. The Kubernetes cluster is managed by our internal Devops team.

We also use Apache Kafka. That is where we get our event streams. We get millions of events through Kafka. There are more than 300K to 500K events per second that we get through that channel. 

We aggregate the events and generate reporting metrics based on the actual events that are recorded. There are certain real-time high-volume events that are coming through Kafka which are like any other stream. We use Flink for aggregation purposes in this case. So we read this high volume events from Kafka and then we aggregate. There is a lot of business logic running behind the scenes. We use Flink to aggregate those messages and send the result to a database so that our API layer or BI users can directly read from database. 

How has it helped my organization?

Flink has improved my organisation by enabling us to become independent of Redis which is used as an intermediate caching layer with Apache Storm for aggregation. Redis was a bottleneck. With an increasing number of messages, Redis was becoming full and also had a higher chance of errors because we were doing the checkpoints and state management manually. 

With Flink, it provides out-of-the-box checkpointing and state management. It helps us in that way. When Storm used to restart, sometimes we would lose messages or intermediate state . With Flink, it provides guaranteed message processing, which helped us. It also helped us with application maintenance/deployments and restarts.

What is most valuable?

When we use the Flink streaming pipeline, the first thing we use is the Windowing mechanism with event time feature. So as that happens, the Flink aggregation is very easy. The next thing is we were using Apache Storm. Apache Storm is stateless, and Apache Flink is stateful. With Apache Storm, we were supposed to use an intermediate distributed cache. But because we use Flink, and it is stateful, we can manage the state or failure mechanism. The result is that we do aggregation every 10 minutes and we do not need to worry about our application stopping in between those 10 minutes and then restarting.

When we were using Storm, we used to manage all of it ourselves. We created manual checkpoints in Redis, but with Flink, it supports inbuilt features like checkpointing and statefulness. There is event time or author time that you can have for your messages. 

Another important thing is the out-of-order message processing. When you use any streaming mechanism, there is a chance that your source is producing messages that are out-of-order. When you build a state machine, it's very important that you can have the messages in order, so that your computations/results are correct. What happens with Storm or any other framework that you're using is that to get messages in order, you have to use an intermediate Redis cache, and then sort the messages. When we use Flink, it has an inbuilt way to have the messages in order, and we can process them. It saves a lot of time and a lot of code.

I have written both Storm and Flink code. With Storm, I used to write a lot of code, hundreds of lines but with Flink, it's less, around 50 to 60 lines. I don't need to use Redis to do the intermediate cache. There is a lot of code that is being saved. I have to aggregate around 10 minutes and there is an inbuilt mechanism. With Storm, I need to write out logic and then I need to write a bunch of connected bolts and intermediate Redis. The same code that would take me one week to write in Storm, I could do that same thing in a couple of days with Flink.

I started with Flink five to six years ago for one use case and the community support and documentation were not good at that time. When we started back again in 2019, we saw that documentation and community support were good.

What needs improvement?

The state maintains checkpoints and they use RocksDB or S3. They are good but sometimes the performance is affected when you use RocksDB for checkpointing.

We can write python bolts/applications inside Apache Storm Code and it supports Python as a programming language, but with Flink, the Python support is not that great. When we do machine learning, data science, or ML work, we want to integrate the data science or machine learning pipeline with our real-time pipeline and most of the data science or machine learning work is in Python.

It was very easy with Storm. Storm supports native Python language, so integration was easy. But Flink is mostly Java. The integration of Python with Java is difficult, so it's not direct integration. We need to find an alternative way. We created an API layer in between so the Java and Python layers were communicating by using an API. We just called data science models or ML models using the API which runs in Python while Flink runs in Java. We would like to see improvement where we can have another way to run it. Currently, it's there, but it's not that great. This is an area that we would like to see improvement. 

For how long have I used the solution?

I have been using Apache Flink for one-and-a-half years now. 

What do I think about the stability of the solution?

Stability-wise, it's good and stable. We do aggregations on data streams from  received Kafka.  Flink Application connects to multiple Kafka topic and read the data. The number of messages generated in Kafka are very high . Sometimes in production, we see some glitches, where data is mismatched. Our Flink runs on Kubernetes Cluster , so sometime when worker node crashes or application restarts we see mismatch in aggregation results.

We are yet to verify whether it's a problem with the Flink framework or it's a problem with the code which does aggregation and checkpointing. We are yet to figure out whether the data is lost when a worker nodes crashes or we restart Flink application, or there is a problem with the way we have done the implementation. This problem is intermittent not always replicated.

What do I think about the scalability of the solution?

It's easy to scale because it supports Docker. Once you have Docker/Containers, you can deploy it on Kubernetes or any other container Orchestrator. So scalability-wise, it's good, you can just launch the cluster. When you have an automated cluster launching mechanism, you can easily scale up and down.

So far, there are close to 10 users who use Flink and most of them are software engineers, senior software engineers, DevOps guys, DevOps architect, and a Cloud architect. 

Most of our work was on Storm but we saw improvement with Flink. So we have moved one business application. We have a couple of other main business applications or a data pipeline and we would like to move that as well.

How are customer service and technical support?

We have not used technical support. There are good forums and community support. 

Which solution did I use previously and why did I switch?

We switched from Storm to Flink. We looked at Apache Spark Streaming as well, but some of the use cases were better in Apache Flink. We chose Flink over Spark Streaming and Kafka Streams. We thought Flink was better and so we went with it.

Spark is micro-batch but this Flink offers complete streaming. Memory management with Apache Spark is not that great, but Flink has automatic memory management. For our use case, we found Flink is faster as compared to Spark. The windowing mechanism that Flink provides is better than Spark.

How was the initial setup?

In terms of the implementation, we initially set up our development instances for Mac, which was easy. We have the documentation available. For the setup, when we wanted to move it to production, it provided the setup on Kubernetes. That Kubernetes setup is a little bit complicated. You need a person who understands Kubernetes well. A developer alone cannot do it. When you want to take it to production, the setup on Kubernetes using Docker is a little bit complicated. We need something like a one-click deployment script that can launch the cluster so that you can then do it.

In another case, we used AWS. There is Flink support in AWS EMR that we could have readily. It's a manged service and so it was easier for us. We don't need to bother with launching the cluster and running our workload. When we have to manage our own cluster using Kubernetes and Flink, it's a little bit complicated. There are a bunch of manual steps that need to be done.

Moving to production, we did EMR a couple of days. But for the Kubernetes cluster setup, it took us two to three weeks. The setup required a couple of team members from the DevOps team and engineering side.

In terms of our deployment strategy, we were already using the Kubernetes cluster for most of the use cases, and we wanted to use the same Kubernetes cluster. The first thing we wanted to do is Dockerize the application that we were running and then use the same Kubernetes cluster or create a separate workspace in that and use it. 

What about the implementation team?

We did the deployment ourselves. We have a team of three or four DevOps guys who manage our Kubernetes cluster.

For the deployment, we needed one or two guys and for development, we are three to four people. We had a lot of other business applications that are in Flink. 

Which other solutions did I evaluate?

Apache Storm, Spark Streaming , Kafka Streams

What other advice do I have?

My advice would be to validate your use case. If you are using already a streaming mechanism, I suggest that you validate what your actual use cases are and what the advantages of Flink are. Make sure that the use case that you are trying can be done by Flink. If you're doing simple aggregation and you don't want to worry about the message order then it's fine. You can use Storm or whatever you are using. If you see features that are there and are useful for you, then you should go for Flink.

Validate your use case, validate your data and pipeline, do a small POC, and see if it is useful. If you think it's useful and worth doing a migration from your existing solution, then go for it. But if you don't already have a solution and Flink will be your first one, then it's always better to use Flink.

The biggest lesson I have learned is that the deployment using Kubernetes was a little bit difficult. We did not evaluate when we started the work, so we migrated on the code part, but we did not take on the deployment part. Initially, if we would have seen the deployment part, then we could have chosen Kafka Streams as well because we were getting a similar result, but on the deployment side, Kafka Streams was easy. You don't need to worry about the cluster.

I would rate Apache Flink an eight out of ten. I would have given it a nine or so if it wasn't for that the deployment on Kubernetes is a little bit complicated.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Learn what your peers think about Apache Flink. Get advice and tips from experienced pros sharing their opinions. Updated: October 2021.
543,424 professionals have used our research since 2012.
Jyala Rahul Jyala
Sr Software Engineer at a tech vendor with 10,001+ employees
Real User
Top 5
Good documentation, API support, and metrics, but it only has partial Python support

Pros and Cons

  • "The documentation is very good."
  • "We have a machine learning team that works with Python, but Apache Flink does not have full support for the language."

What is our primary use case?

We are using Flink as a pipeline for data cleaning. We are not using all of the features of Flink. Rather, we are using Flink Runner on top of Apache Beam.

We are a CRM product-based company and we have a lot of customers that we provide our CRM for. We like to give them as much insight as we can, based on their activities. This includes how many transitions they do over a particular time. We do have other services, including machine learning, and so far, the resulting data is not very clean. This means that you have to clean it up manually. In real-time, working with Big Data in this circumstance is not very good.

We use Apache Flink with Apache Beam as part of our data cleaning pipeline. It is able to perform data normalization and other features for clearing the data, which ultimately provides the customer with the feedback that they want. We also have a separate machine learning feature that is available, which can be optionally purchased by the customer.

How has it helped my organization?

We have a set of pipeline services that we run. For example, we might use Apache Beam for running a four-hour service, and we use Flink to run it. You can run any job using Flink, including an Apache Spark job. 

We have many systems including Elasticsearch Database, MongoDB, and other services. Based on what we have running, we want to clean and transform some of our data.

Currently, we have two implementations of Flink and one of them is running Kafka, whereas the other one is Cassandra. Based on that, we process all of the things that we want and if it's streaming then we used Kafka, whereas if it is batch then we use Cassandra. The result of all of these services is that it can provide a much better user experience.

What is most valuable?

The most valuable feature is that there is no distinction between batch and streaming data. When we want to use batch mode, we use Apache Spark. The problem with Spark is that when it comes to time-series data, it does not train well. With Flink, however, we can have the streaming capability that we want.

The documentation is very good.

A lot of metrics are supported and there is also logging capability.

There is API support.

What needs improvement?

We have a machine learning team that works with Python, but Apache Flink does not have full support for the language. We needed to use Java to implement some of our job posting pipelines.

For how long have I used the solution?

We have been using Apache Flink for between one and one and a half years.

What do I think about the stability of the solution?

Stability is pretty good and we haven't had any problem with it.

We are using this product extensively and we have new products being onboarded.

What do I think about the scalability of the solution?

Apache Flink scales well. As long as we are using Kubernetes, we are happy to scale as much as you want.

We have a data team with between twenty and twenty-five people. It is split into two groups where the first group works on reporting, machine learning, and background operations. The second group works with Big Data.

How are customer service and technical support?

We have not used technical support from Apache.

Community support is available.

Which solution did I use previously and why did I switch?

Prior to Flink, we used Apache Spark.

We had to move to Flink because of the streaming capabilities that it has. In our architecture, we have one layer for batch processing and the other for streaming. This is quite a pain for us because we don't want to have two separate jobs to handle both streaming and batch processing. Using Flink, we are able to utilize the API and handle both of these jobs.

How was the initial setup?

The complexity of the initial setup is dependent on your use cases and what it is that you are trying to achieve. I found that we didn't have any problem with it.

This product can be deployed on-premises or as a SaaS on the cloud. It depends on the requirements of the customer.

The deployment using Kubernetes takes approximately 30 minutes to complete.

What about the implementation team?

Our in-house team is responsible for scaling and other maintenance. There is very good documentation available for this.

What's my experience with pricing, setup cost, and licensing?

This is an open-source platform that can be used free of charge.

Which other solutions did I evaluate?

We got to learn about Apache Flink through using Apache Beam. Originally, I did not know very much about Flink. The problem with Apache Beam is that you cannot run it alone. Once you create the jobs, you need a tool to run them. There are two options left, being Apache Spark and Apache Flink. We chose Flink because it was more compatible with what we wanted to do.

What other advice do I have?

We are very happy with the product, and we have been able to achieve all of the use cases that we are expected to deliver for our customers.

Over time, I have seen many improvements including in the documentation. An example is that when we first started using this product, almost two years ago, there was no support available.

At this point, we do not have much opt-in but we have some use cases to ensure that our system is not breaking. We have QA who can validate these things based on what is expected versus what we have done.

My advice for anybody who is considering Flink is that it has very mature documentation and you can do what you want. It is a very good way to implement streaming pipelines and you won't have any problems.

The biggest lesson that I have learned from using Flink is how we can customize the experience for the customer and how important it is to keep up with the industry. We don't want to be left behind.

I would rate this solution a seven out of ten.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
RP
Software Development Engineer III at a tech services company with 5,001-10,000 employees
Real User
Top 10
Provides truly real-time data streaming with better control over resources; ML library could be more flexible

Pros and Cons

  • "This is truly a real-time solution."
  • "The machine learning library is not very flexible."

What is our primary use case?

My company is a cab aggregator, similar to Uber in terms of scale as well. Just like Uber, we have two sources of real-time events. One is the customer mobile app, one is the driver mobile app. We get a lot of events from both of these sources, and there are a lot of things which you have to process in real-time and that is our primary use case of Flink. It includes things like surge pricing, where you might have a lot of people wanting to book a cab so the price increases and if there are fewer people, the price drops. All that needs to be done quickly and precisely. We also need to process events from drivers' mobile phones and calculate distances. It all requires a lot of data to be processed very quickly and in real-time.

How has it helped my organization?

The end-to-end latency was drastically reduced, and our capability of handling high throughput has increased by using Flink. It provides a lot of functionality with its windows and maps and that gives us a lot of extra features and power that other frameworks don't have. The solution has helped us by enabling a lot of creative features so we are now able to detect if something abnormal is happening, like a driver has deviated from the set route or the car has not moved for a long period of time, all in real time. Being able to check this has led to more secure rides for our customers. 

What is most valuable?

The most valuable feature of Apache Flink would be that it is truly real-time. Unlike Spark and other technologies, it's not recurring batch processing and it also allows me better control over resources. For example, in Spark, it's very difficult to create multiple parallel streams and it consumes the memory of your entire cluster very greedily. With Flink, I have very good control, can choose the number of task managers with a fixed amount of memory, and configured parallelism. This flexibility is very useful in scaling of Flink.

What needs improvement?

Flink has become a lot more stable but the machine learning library is still not very flexible. There are some models which are not able to plug and play. In order to use some of the libraries and models, I need to have a Python library because there might be some pre-processing or post-processing requirements, or to even parse and use the models. The lack of Python support is something they can maybe work on in the future. 

For how long have I used the solution?

I've been using this solution for two years. 

What do I think about the stability of the solution?

The solution has become a lot more stable over time. We have around 10-12 users and most are software developers. Even if we are running our task managers on cheap servers, we make sure that our job manager definitely runs on a very expensive server, which never goes down. Things remain more stable that way. We're a large company and have teams dedicated to dealing with the infrastructure and taking care of the maintenance and infra, making sure that jobs runs smoothly. A small company could do the maintenance itself. 

What do I think about the scalability of the solution?

It's a good product and allows me to scale very easily. If I'm getting more and more data, I can very easily increase the memory allocated for every task manager, and increase the number of parallelism. We are increasing usage of Flink as much as possible. There are some things that we still run on Spark, but whenever we need to scale, have easy resource management, and a more real-time streaming solution, we usually now go for Flink. We have never faced any scalability issues and we are running at a very high profile. I think we are yet to reach the scaling limits of Flink.

How are customer service and technical support?

We have never used Apache's tech support. We usually just Google for our questions. If we don't get the answers directly from Google, we go through the documentation, which is comprehensive, and usually find our answers there. 

Which solution did I use previously and why did I switch?

We previously used Spark for streaming, but not for real time applications. We have moved some of our services from Spark to Flink. We also use Kafka extensively, but that is mostly for asynchronous communication between different services. Kafka is a totally different use case. You cannot substitute it with Flink. Overall in terms of streaming, we have used Spark, Kafka and Flink.

How was the initial setup?

The initial setup was not very straightforward because compared to other frameworks, Flink is quite new. There isn't yet a good online community, online blogs, or guidance. You have to rely more or less on their documentation for everything. Even if you go to Stack Overflow, for Spark, you will get lots of questions and answers which will help you. With Flink, you have to actually read a lot. It's not as straightforward as other frameworks.

What's my experience with pricing, setup cost, and licensing?

We have only used the open-source version of Flink. 

What other advice do I have?

My advice would be to make sure you understand your requirements, flink's architecture, how it works and whether it is the right solution for you. They provide very good documentation which is useful. The solution isn't suitable for every case and it may be that Spark or some other framework is more suitable. If you are a major company that cannot afford any downtime, and given that Flink is a relatively new technology, it might be worthwhile investing in the monitoring. That would include writing scripts for monitoring and making sure that the throughput of the applications is always steady. Make sure your monitoring and your SOPs around monitoring, are in place.

I would rate this solution a seven out of 10. 

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Vinod Iyer
Principal Software Engineer at a tech services company with 1,001-5,000 employees
Real User
Top 10
Offers good API extraction and in-memory state management

Pros and Cons

  • "Apache Flink is meant for low latency applications. You take one event opposite if you want to maintain a certain state. When another event comes and you want to associate those events together, in-memory state management was a key feature for us."
  • "In terms of improvement, there should be better reporting. You can integrate with reporting solutions but Flink doesn't offer it themselves."

What is our primary use case?

The last POC we did was for map-making. I work for a map-making company. India is one ADR and you have states within, you have districts within, and you have cities within. There are certain hierarchical areas. When you go to Google and when you search for a city within India, you would see the entire hierarchy. It's falls in India. 

We get third party sources, government sources, or we get it from different sources, if we can. We get the data, and this data is geometry. It's not a straightforward index. If we get raw geometry, we will get the entire map and the layout. 

We do geometry processing. Our POC was more of processing geometry in a distributed way. The exploration that I did was more about distributing this geometry and breaking this big geometry.

How has it helped my organization?

Flink moved on to becoming a standard technology for location platform. There's only one location platform available right now by an open location platform. That platform leverages Flink. Because Flink is the component for streaming screen property. It's an optic whose data extends within the organization. Anywhere we need low latency applications, we use Flink.

What is most valuable?

Apache Flink is meant for low latency applications. You can take one event opposite if you want to maintain a certain state. When another event comes and you want to associate those events together, in-memory state management is a key feature for this. 

Checkpointing was important since we have the consumption done by Kafka. There was a continuous pool of data coming in from cars and which was put into Kafka. This particular Apache Flink component came in and it started processing it. When there's a failure or something effective, checkpointing is very important. 

It also helped us in exactly one standard. Another valuable feature is API extraction, which is nicely done. Anyone can understand it. It's not very complex. Anyone can go through all the transformations and everything they have. It's easy to use that. It's a well-balanced extraction.

What needs improvement?

In terms of improvement, there should be better reporting. You can integrate with reporting solutions but Flink doesn't offer it themselves. 

They're more about the processing side. Low latency processing is out of their scope. As ar as low latency is concerned, you can integrate to other backend solutions as well. They have that flexibility. APIs are good enough. Its in-memory is so fast, you could have faster-developed data and stuff like that.

What do I think about the stability of the solution?

The stability was good enough. There are a few issues that were application dependent. From the processing standpoint, it did what it was expected to do as such, there were a few issues with Python integration like checkpointing. The checkpointing was not done properly at times, but again that was more about integration going faster and also optimizing our checkpointing intervals and stuff like that. As Flink is concerned, they have good checkpointing and safe points.

There were 50 developers and DevOps working on it.

How are customer service and technical support?

I was on the DevOps side. Support was all driven from Chicago. There was a different team in Chicago who was driving all this stuff. I was a completely hands-on developer. My interaction was more from using the API and developing applications. I don't need to use support. Flink was straightforward. 

How was the initial setup?

The deployment can be declared on any kind of distributed managers. I haven't used it, but that's a good option that you could even integrate it within APIs. This adds flexibility to it. 

I was not part of the deployment when it was initially done. When I came into the picture, it was more about the API. We had already started using it at the application level at the organization. initially, when it was done in my previous organization, it was an earlier version of Flink. I think they started off in 2016 and there might have been some glitches or some technical issues. When I came in it was pretty smooth. I didn't find any issues and really I hopped into Flink.

Which other solutions did I evaluate?

We also looked at Spark Streaming versus Apache. Spark Streaming is not real-time. That's where we understood that Flink is good enough when you want to have real-time Processing. That's the only process that we have right now and Spark Streaming is more of a big data set.

If you want real-deal real-time processing, you have to invest in Flink but part of it is more of when you use Flink, you have everything stored. You store the state also in-memory so you add up the cost of using that engine compared to PAC streaming. It's not mandatory it's up to the application but if you want to really have real-time processing if you want to store the state and if you really want to have a low latency application, that's when I would go with Flink. Whereas Spark streaming would be more of whenever it's okay to have a bit of a delay like really low latency applications.

Flink gives you flexibility. The reason we chose Spark was because people in our company were already familiar with it. We haven't started working on it yet because it's a half-done POC.

What other advice do I have?

Flink is really simple and simple to adopt. You can use any backend state management tools, like DB or something of that sort. it has the visibility to integrate with different technologies, that's also very important. It's pretty welded and I believe for low latency. The API is pretty well written that way to support you. 

I would rate Apache Flink an eight out of ten. 

Disclosure: I am a real user, and this review is based on my own experience and opinions.
BH
Senior Software Engineer at a tech services company with 5,001-10,000 employees
Real User
Top 10
Drastically reduces the turnaround/ processing time, Documentation is in depth and most of the things are straight forward.

Pros and Cons

  • "The event processing function is the most useful or the most used function. The filter function and the mapping function are also very useful because we have a lot of data to transform. For example, we store a lot of information about a person, and when we want to retrieve this person's details, we need all the details. In the map function, we can actually map all persons based on their age group. That's why the mapping function is very useful. We can really get a lot of events, and then we keep on doing what we need to do."
  • "The TimeWindow feature is a bit tricky. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks. A watermark is basically associating every data with a timestamp. The timestamp could be anything, and we can provide the timestamp. So, whenever I receive a tweet, I can actually assign a timestamp, like what time did I get that tweet. The watermark helps us to uniquely identify the data. Watermarks are tricky if you use multiple events in the pipeline. For example, you have three resources from different locations, and you want to combine all those inputs and also perform some kind of logic. When you have more than one input screen and you want to collect all the information together, you have to apply TimeWindow all. That means that all the events from the upstream or from the up sources should be in that TimeWindow, and they were coming back. Internally, it is a batch of events that may be getting collected every five minutes or whatever timing is given. Sometimes, the use case for TimeWindow is a bit tricky. It depends on the application as well as on how people have given this TimeWindow. This kind of documentation is not updated. Even the test case documentation is a bit wrong. It doesn't work. Flink has updated the version of Apache Flink, but they have not updated the testing documentation. Therefore, I have to manually understand it. We have also been exploring failure handling. I was looking into changelogs for which they have posted the future plans and what are they going to deliver. We have two concerns regarding this, which have been noted down. I hope in the future that they will provide this functionality. Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is required in the documentation. We have a use case where we want to actually analyze or get analytics about how much data we process and how many failures we have. For that, we need to use Tomcat, which is an analytics tool for implementing counters. We can manage reports in the analyzer. This kind of integration is pretty much straightforward. They say that people must be well familiar with all the things before using this type of integration. They have given this complete file, which you can update, but it took some time. There is a learning curve with it, which consumed a lot of time. It is evolving to a newer version, but the documentation is not demonstrating that update. The documentation is not well incorporated. Hopefully, these things will get resolved now that they are implementing it. Failure is another area where it is a bit rigid or not that flexible. We never use this for scaling because complexity is very high in case of a failure. Processing and providing the scaled data back to Apache Flink is a bit challenging. They have this concept of offsetting, which could be simplified."

What is our primary use case?

Services that need real-time and fast updates as well as lot of data to process, flink is the way to go. Apache Flink with kubernetes is a good combination. Lots of data transformation grouping, keying, state mangements are some of the features of Flink. My use case is to provide faster and latest data as soon as possible in real time.

How has it helped my organization?

The main advantage is the turnaround time, which has been reduced drastically because of Apache Flink. Earlier, it used to take a lot of processing time but now things have changed, and everything is in almost real time. We get the latest data in a very less time. There is no waiting or lag of data in the application, time has been one of the important factors. 

The other factor is memory. The utilization of the machine has been more efficient since we started to use this solution. The big data applications definitely use a large group of machines to process the data. These machines are not optimally utilized. Some of the machines might not have been required, but they still take hold of the resources. In Kubernetes, we can provide resources, and in Apache Flink, there is a configuration where you can do the deployment in combination with a single cluster node. Scalability is quite flexible in flink with task managers and resource configuration.

What is most valuable?

MapFunction, FilterFunction are the most useful or the most used function in Flink. Data transformation becomes easy for example, Applications that store information about people and when they want to retrieve those person's details in some kind of relation, in the map function, we can actually filter all persons based on their age group. That's why the mapping function is very useful. This could be helpful in analytics to target specific news to specific age group.

What needs improvement?

TimeWindow feature. The timing of the content and the windowing is a bit changed in 1.11. They have introduced watermarks.

Watermark is basically associating data in the stream with a timestamp. Documentation can be referred. They have updated rest of the documentaion but not the testing documentation. Therefore, We have to manually try and understand few concepts.  

Integration of Apache Flink with other metric services or failure handling data tools needs some kind of update or its in-depth knowledge is expected before integrating. Consider a use case where you want to actually analyze or get analytics about how much data you have processed and how many failed? Prometheus is one of the common metric tools out of the box supported by flink, along with other metric services. The documentation is straight forward. There is a learning curve with metric services, which can consume a lot of time, if not well versed with those tools.

Failure handling basic documentation is provided by flink, like restart on task failure, fixed delay restart...etc.

For how long have I used the solution?

I have been using Apache Flink for almost nine months.

What do I think about the stability of the solution?

The uptime of our services has increased, resources are better utilized, restarts is automated on failures, alerts are triggered when infrastructure breaches threshold and application failure metrics are logged, due to which the application has become robust, scaling is something which can be tweaked as per usage of application, business rules. Also Flink parameters are configurable, altogether it made our application more stable and maintainable. 

What do I think about the scalability of the solution?

I haven't actually hit a lot of performance or stress test to see the scaling.
There is no detailed documentation for scaling. There is no fixed solution as well, it depends on use cases, there are rest APIs to scale your task dynamically in Flink Documentation, haven't personally tried it yet.

How are customer service and technical support?

I haven't actually used their technical support yet.

Which solution did I use previously and why did I switch?

I have tried batch processing, It was not that much effective in my use case. It was time consuming and not real time solution.

How was the initial setup?

The initial setup is straightforward. A new joined person (with some experience in software industry) would not find it that much difficult to understand and then contribute to the application. When you want to start writing the code for it, then things get tricky and more complex because if you are not involved with what Apache Flink is providing and what you need to do, it is very difficult. If followed the documentation, there are various examples provided and different deployment strategies as well.

What was our ROI?

It is good solution for time and cost saving.

What's my experience with pricing, setup cost, and licensing?

Being open source licensed, cost is not a factor. The community is strong to support.

What other advice do I have?

To get your hands wet on streaming or big data processing applications, to understand the basic concepts of big data processing and how complex analytics or complications can be made simple. For eg: If you want to analyze tweets or patterns, its a simple use case where you just use flink-twitter-connector and provide that as your input source to Flink. The stream of random tweets keeps on coming and then you can apply your own grouping, keying, filtering logic to understand their concepts.

An important thing I learned while using flink, is basic concepts of windowing, transformation, Data Stream API should be clear, or atleast be aware of what is going to be used in your application, or else you might end up increasing the time rather than decreasing. You should also understand your data, process, pipeline, flow, Is flink the right candidate for your architecture or an over kill? 
It is flexible and powerful is all I can say.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
AB
Partner / Head of Data & Analytics at a computer software company with 11-50 employees
Real User
Top 10
Gives us low latency for fast, real-time data, with useful alerts for live data processing

Pros and Cons

  • "The top feature of Apache Flink is its low latency for fast, real-time data. Another great feature is the real-time indicators and alerts which make a big difference when it comes to data processing and analysis."
  • "One way to improve Flink would be to enhance integration between different ecosystems. For example, there could be more integration with other big data vendors and platforms similar in scope to how Apache Flink works with Cloudera. Apache Flink is a part of the same ecosystem as Cloudera, and for batch processing it's actually very useful but for real-time processing there could be more development with regards to the big data capabilities amongst the various ecosystems out there."

What is our primary use case?

We use Apache Flink to monitor the network consumption for mobile data in fast, real-time data architectures in Mexico. The projects we get from clients are typically quite large, and there are around 100 users using Apache Flink currently.

For maintenance and deployment, we split our team into two squads, with one squad that takes care of the data architecture and the other squad that handles the data analysis technology. Each squad is three members each.

What is most valuable?

The top feature of Apache Flink is its low latency for fast, real-time data. Another great feature is the real-time indicators and alerts which make a big difference when it comes to data processing and analysis.

What needs improvement?

One way to improve Flink would be to enhance integration between different ecosystems. For example, there could be more integration with other big data vendors and platforms similar in scope to how Apache Flink works with Cloudera. Apache Flink is a part of the same ecosystem as Cloudera, and for batch processing it's actually very useful but for real-time processing there could be more development with regards to the big data capabilities amongst the various ecosystems out there.

I am also looking for more possibilities in terms of what can be implemented in containers and not in Kubernetes. I think our architecture would work really great with more options available to us in this sense.

Finally, it's a challenge to find people with the appropriate skills for using Flink. There are a lot of people who know what should be done better in big data systems, but there are still very few people with Flink capabilities.

For how long have I used the solution?

I've been using Apache Flink for about one year.

What do I think about the stability of the solution?

We have not really had many issues. 

What do I think about the scalability of the solution?

Scaling Apache Flink is easily done because we use Kubernetes and containers.

How are customer service and technical support?

I can't comment on Apache Flink's technical support but I feel that the documentation is complete and adequate for our needs when doing configuration or solving technical issues.

How was the initial setup?

The setup was complex. The most challenging part of it was identifying how to realize the real-time low latency, fail-over, and high availability within our container and Kubernetes architecture. The configuration of all of this was not simple and it took about a month to get fully set up.

What about the implementation team?

We have two squads in our company that manage the implementation. One squad takes care of the data architecture and the other squad handles the data analysis technology.

What's my experience with pricing, setup cost, and licensing?

Apache Flink is open source so we pay no licensing for the use of the software.

Which other solutions did I evaluate?

Our clients had previously compared Apache Flink with Apache Spark and Apache Spark Streaming. The main advantage of Flink in comparison is that Flink handles complex processing better.

What other advice do I have?

My advice to others when using Apache Flink is to hire good people to manage it. When you have the right team, it's very easy to operate and scale big data platforms.

I would rate Apache Flink a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Amazon Web Services (AWS)
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner
Flag as inappropriate
Ertugrul Akbas
Manager at a computer software company with 11-50 employees
Real User
Top 5Leaderboard
Easy to use, stable, scalable, and has good community support with a lot of documentation

What is our primary use case?

We use Apache Flink in-house to develop the Tectonic platform.

What is most valuable?

It's usable and affordable. It is user-friendly and the reporting is good.

What needs improvement?

There is a learning curve. It takes time to learn. The initial setup is complex, it could be simplified.

For how long have I used the solution?

I have been using Apache Flink for more than one year. I am using the latest version.

What do I think about the stability of the solution?

Apache Flink is a scalable product. We have no issues with the stability.

What do I think about the scalability of the solution?

It's a very scalable solution. We have more than 100 people in our organization who are using it.

How are customer service and technical

What is our primary use case?

We use Apache Flink in-house to develop the Tectonic platform.

What is most valuable?

It's usable and affordable.

It is user-friendly and the reporting is good.

What needs improvement?

There is a learning curve. It takes time to learn.

The initial setup is complex, it could be simplified.

For how long have I used the solution?

I have been using Apache Flink for more than one year.

I am using the latest version.

What do I think about the stability of the solution?

Apache Flink is a scalable product. We have no issues with the stability.

What do I think about the scalability of the solution?

It's a very scalable solution. We have more than 100 people in our organization who are using it.

How are customer service and technical support?

We use community resources. There is a lot of documentation available online.

How was the initial setup?

The initial setup is complex.

What's my experience with pricing, setup cost, and licensing?

It's an open-source solution.

Which other solutions did I evaluate?

We have not evaluated competitors. We followed the trends and based on the experience and opinions of people from all over the world, we selected Apache Flink.

What other advice do I have?

I would recommend Apache Flink to others who are interested in using it.

I would rate this solution an eight out of ten.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Product Categories
Streaming Analytics
Buyer's Guide
Download our free Apache Flink Report and get advice and tips from experienced pros sharing their opinions.