Apache Flink Primary Use Case

-Rahul Agarwal
Sr. Software Engineer at a tech services company with 10,001+ employees
Initially, we created our own servers and then eBay created their infrastructure. Now it's deployed on the eBay cloud. Our primary use case is trying to do real time aggregations/near-real time aggregations. Let's say for example that we are trying to do some count, sum,min,max distinct counts for different metrics that we care about, but we do this in real time. So let's say, you have an e-commerce company and you want to measure different metrics. If I take the example of risk, let's say you want to check if one particular seller on your site is doing something fishy or not. What is the behavior? How many listings do they have? In the past five minutes, one hour or one day or one year? You want to measure this over time. This data is very important to you from the business metric point of view. Often this data data is delayed by 1 day via offline analytics. You do ETL for these aggregations ,it's okay for offline business metrics. But when you want to do risk detection for online businesses, it needs to be right away in real time, and that's where those systems fail and where Apache Flink helps. And if combined with Lambda architecture, you can get them real time with the help of a parallel system that captures very latest data. View full review »
Sandesh Deshmane
Solution Architect at a tech vendor with 501-1,000 employees
We have our own infrastructure on AWS. We deploy Flink on Kubernetes Cluster in AWS. The Kubernetes cluster is managed by our internal Devops team. We also use Apache Kafka. That is where we get our event streams. We get millions of events through Kafka. There are more than 300K to 500K events per second that we get through that channel. We aggregate the events and generate reporting metrics based on the actual events that are recorded. There are certain real-time high-volume events that are coming through Kafka which are like any other stream. We use Flink for aggregation purposes in this case. So we read this high volume events from Kafka and then we aggregate. There is a lot of business logic running behind the scenes. We use Flink to aggregate those messages and send the result to a database so that our API layer or BI users can directly read from database. View full review »
Jyala Rahul Jyala
Sr Software Engineer at a tech vendor with 10,001+ employees
We are using Flink as a pipeline for data cleaning. We are not using all of the features of Flink. Rather, we are using Flink Runner on top of Apache Beam. We are a CRM product-based company and we have a lot of customers that we provide our CRM for. We like to give them as much insight as we can, based on their activities. This includes how many transitions they do over a particular time. We do have other services, including machine learning, and so far, the resulting data is not very clean. This means that you have to clean it up manually. In real-time, working with Big Data in this circumstance is not very good. We use Apache Flink with Apache Beam as part of our data cleaning pipeline. It is able to perform data normalization and other features for clearing the data, which ultimately provides the customer with the feedback that they want. We also have a separate machine learning feature that is available, which can be optionally purchased by the customer. View full review »
Find out what your peers are saying about Apache, Amazon, VMware and others in Streaming Analytics. Updated: October 2020.
442,986 professionals have used our research since 2012.
Vinod Iyer
Principal Software Engineer at a tech services company with 1,001-5,000 employees
The last POC we did was for map-making. I work for a map-making company. India is one ADR and you have states within, you have districts within, and you have cities within. There are certain hierarchical areas. When you go to Google and when you search for a city within India, you would see the entire hierarchy. It's falls in India. We get third party sources, government sources, or we get it from different sources, if we can. We get the data, and this data is geometry. It's not a straightforward index. If we get raw geometry, we will get the entire map and the layout. We do geometry processing. Our POC was more of processing geometry in a distributed way. The exploration that I did was more about distributing this geometry and breaking this big geometry. View full review »
Hitesh Baid
Senior Software Engineer at a tech services company with 5,001-10,000 employees
Services that need real-time and fast updates as well as lot of data to process, flink is the way to go. Apache Flink with kubernetes is a good combination. Lots of data transformation grouping, keying, state mangements are some of the features of Flink. My use case is to provide faster and latest data as soon as possible in real time. View full review »
Find out what your peers are saying about Apache, Amazon, VMware and others in Streaming Analytics. Updated: October 2020.
442,986 professionals have used our research since 2012.