Initially, we created our own servers and then eBay created their infrastructure. Now it's deployed on the eBay cloud.
Our primary use case is trying to do real time aggregations/near-real time aggregations. Let's say for example that we are trying to do some count, sum,min,max distinct counts for different metrics that we care about, but we do this in real time. So let's say, you have an e-commerce company and you want to measure different metrics. If I take the example of risk, let's say you want to check if one particular seller on your site is doing something fishy or not. What is the behavior? How many listings do they have? In the past five minutes, one hour or one day or one year? You want to measure this over time.
This data is very important to you from the business metric point of view. Often this data data is delayed by 1 day via offline analytics. You do ETL for these aggregations ,it's okay for offline business metrics. But when you want to do risk detection for online businesses, it needs to be right away in real time, and that's where those systems fail and where Apache Flink helps. And if combined with Lambda architecture, you can get them real time with the help of a parallel system that captures very latest data.