What is our primary use case?
In the simpler use case, we were just pumping in some data. We wanted a product, an AWS service, that would accept data in bursts. We were pushing in, for example, 500 records every 300 milliseconds. What I'm trying to say is per second we were trying to pump in around 1,500 records into some streaming services what we were looking at. That type of streaming information would then go into another source, for example Lambda. Then Lambda would consume the data and ultimately we would process and store it in DynamoDB.
This was the basic flow that we had. We were looking for a service. And at that point in time in our organization, the architects were asking us to leverage Kinesis to see how it performed. They wanted to see how it performs, so they were encouraging us to use it. Although we were looking at something as simple as SQS and SNS, they were encouraging us to use Kinesis and that is what we did.
There were a few considerations when we moved Kinesis. What is the reliability? When I say reliability, I mean resilience, or the failure mechanism we thought was required for that use case because we did not want to lose data. Also, we wanted to have the ability to replay from a certain point because we were pumping in reports from a data source and we were always keeping track of the point at which we had stopped. So if we wanted to replay something from the prior data which was already processed by Kinesis, and it failed in the Lambda, we wanted to have the ability to retry and replay the previously processed stream.
That prompted us to use Kinesis because it has the really good feature of being able to replay for 24 hours whatever you've already processed and this allows us to replay it. That was one key feature that we thought we would need. In fact, performance-wise, it performed really well. We also understood that it is actually meant for streaming, video streaming and stuff like that. Even data streaming. It does a good job with it. But mostly, we saw that it is a more suitable service for video streaming simply because when we actually pump data into Kinesis, we don't know how to test it other than waiting for the data to come out of it from the other end and hook into Lambda and extract data out of it and process it.
That's the only way we can test it. That was a drawback but it did not matter too much. But it did matter in the next project, and for the bigger use cases where we used Kinesis. But this project was a simple use case and it served really well, so we kept it as-is. We moved on to the next project, which was bigger. It was an event-driven architecture that we were trying out on one of the features. When we went event-driven, at that time a few of the new features and new services from Amazon which are available right now, were not available.
We thought of using Kinesis again to stream the data from one microservice to another in a proper microservice architecture. We were using this as a communication medium between microservices. This is where the testing was a little complicated for us. Ultimately, what we realized out of the entire exercise was that Kinesis may not have been the right choice of service for us for our use case. But what we discovered were the benefits of using Kinesis and also the limitations in certain use cases.
The biggest lesson learned for us was even before you take up anything like Kinesis, which is a big AWS service, there has to be a POC, proof of concept, done. To see whether it really suits that use case or not. That is what we ultimately realized. Before that, there were a few other reasons why we chose Kinesis over DynamoDB streaming. Ultimately it was from one microservice to another, and each microservice had its own DynamoDB data store.
We were thinking of using the DynamoDB Stream and Kinesis to keep things simple. But it turned out that DynamoDB Streams have a limitation that whatever a stream comes out of DynamoDB it can be consumed only by a single client. But with Kinesis it doesn't matter. Any number of data sources can come in and whatever Kinesis publishes can be consumed by any number of clients. That is why we went with Kinesis in order to see how it performed. Because even performance-wise, we found that we need a crazy load server because we are part of the wagering industry, which needs peak performance. Online betting. In Australia, it's a regulated market and one of the most happening businesses. Here, performance is really important, because there are quite a few competitors, around 10 to 15 prominent competitors and if we have to stand out, our performance has to be beyond the customer's expectation.
So, with that in mind, they knew our performance had to scale up. That is where we found the advantage of using Kinesis. It's been reliable. It has not failed to publish. It actually did fail, but the failure was simply because of pumping in too much data than what Kinesis can take in.
There is a limit that we discovered. I don't remember the numbers there. But we did manage to break Kinesis by pumping in too much data.
How has it helped my organization?
The major advantage with Amazon Kinesis is the availability. Additionally, the reliability is awesome when it comes to Kinesis. Kinesis also offers the replay.
It is incredibly fast. The ingesting of data, the buffering, and processing the data are really fast. With AWS you always get the the dashboard for monitoring. That is a really good aspect for us to see how Kinesis is performing. Otherwise there is no other way for us to know what's happening within Kinesis other than the Lambda kicking in and processing. So the Lambda logs were indirectly necessary for us to look into Kinesis.
The dashboarding AWS provides out of the box for monitoring the performance of benefits is quite nice. Also, it is a self-managed service so we don't need to worry about what happens behind Kinesis. That was another big win for us. We did not have to worry about how to maintain or manage Kinesis in general. That was a consideration. It is kind of server-less.
The scalability was quite acceptable. It can handle a large amount of data as well. It can take in a large amount of data, but there is a limit. It can take a huge amount of data and process it from many sources. We can have any number of data sources coming in, and it can ingest all of them and publish it to wherever you want.
You can design your code in such a way that the Lambda that actually processes whatever is published by Kinesis can kind of segregate the data coming in from multiple data sources, based on the logic that is implemented there. That is a nice feature. Ingesting data from multiple sources, and being able to publish it to multiple destinations.
What is most valuable?
The feature that I've found most valuable is the replay. That is one of the most valuable in our business. We are business-to-business so replay was an important feature - being able to replay for 24 hours. That's an important feature.
In our use case Kinesis was able to handle the rate at which we were pumping in data and it could publish the data to whatever destination, be it Lambda or any other consumer.
We were seeing that there was a delay in the amount of processing time of the Lambda and the subsequent storing into DynamoDB. There was a delay in that process. So, at the rate at which we were pumping in the data, it was obvious we had ensured that this should work. This rate at which we were pumping it is the rate at which the data is published and processed, as well. But we saw that it was not working. Not the Kinesis data nor the subsequent parts of our application, they tended to not be up to the mark with Kinesis. So the business asked us for the ability to be able to get back to a certain point in time and replay the entire thing. That way there is a record if there is an error when it is being processed.
The ordering is another big thing for us. Kinesis is known for maintaining the order in which the data is ingested. We can tweak that and can configure Kinesis to ensure that the ordering is maintained. The order in which the data is actually being published is also important for us. That is why the business was ok even if a thousand record failed to process, because they were okay to start from 500 again, and again reach a thousand. They wanted to ensure that there was no scope for failure there. That is why the replay feature was useful for us. That is why both performance and replay are important. When I say performance, I mean the reliability. Kinesis has an inbuilt replay mechanism that also came in handy for us.
These were the crucial things that we were looking at, and it worked quite well.
What needs improvement?
In general, the pain point for us was that once the data gets into Kinesis there is no way for us to understand what's happening because Kinesis divides everything into shards. So if we wanted to understand what's happening with a particular shard, whether it is published or not, we could not. Even with the logs, if we want to have some kind of logging it is in the shard. That is something that we thought we needed then, but later we realized that Kinesis was not built for that. They must have already improved by now, because I have not been in touch with AWS for the last five, six months since I joined this organization which uses Azure. I did not get to experiment with AWS Kinesis too much after that.
It was built for something else, but we used Kinesis for one purpose and we were expecting a feature out of it that may not have really been the design of the service when they built Kinesis. It was almost like a black box for us, because once the data comes in we need to rely on the Lambda itself to let us know. Because if some Kinesis code is coming in, it processes that we will log back in using the Lambda. And that is where we would know, "Oh, okay this guy has come in, this guy has come in." We hoped for a better way of being able to track the shard being processed or how they streamed within Kinesis.
We wanted to have a look at that, but that was not available then. It may not even be available now. We did not have the feature that we expected in the first place from Kinesis. Overall that was the only thing that we felt was lacking. Our use case may not have been the most ideal one, but other than that we did not have many qualms with Kinesis. Overall, we felt we would have simplified the entire design of what we did by simply using an SNS and SQS, because we have much better visibility in terms of tracking what happens within the SNS and SQS.
For how long have I used the solution?
I have used Amazon Kinesis for a couple of projects starting from August 2019 until July 2020. I used Amazon Kinesis in exactly two projects in fact, one after the other.
What do I think about the scalability of the solution?
In terms of scalability, there is a limit which is documented by Amazon. But when we started using it, we didn't know that. We did not evaluate its complete documentation. Of course we went through the aspects that we wanted to understand and we made the choice. But it did break at a certain point.
It was okay for us simply because we could do with a lower pumping rate. So, it did not cause too much of a hazard for the business as such, but we did manage to break Kinesis.
Overall, what we realized was for event driven architecture for simple use cases where you need reliable streaming, Amazon Kinesis works really well. But, for event driven it may not be the best choice.
That's what we figured out at the end of our project. The project was successful. It served its purpose. But the amount of support that we had to provide to see that the entire infrastructure holds up to the load was high.
We felt that we could have done with an easier adaptation of the same architecture. We could have gone with an easier implementation, by probably choosing SNS and SQS over Kinesis in our use case. So, lessons learned.
This is all that we worked on with Kinesis. This is what we figured out after close to a year of working with it. One project was no problem at all. Whatever the purpose, Kinesis did more than expected. And, in the other one we kind of hit the boiling point of Kinesis and realized that it may not be the right choice in that scenario. But it was still okay. We still left it there, and it served its purpose.
How are customer service and technical support?
We had an Amazon technical advisor who was visiting us once every week on the same day. He would be with us and he would just be there and we could reach out to him and ask him for suggestions as to what we could use and what we should do. He would help us with whatever queries we would give him. Even if he did not know he went back to the Amazon experts and then he would get us the answers. But, in this case for Kinesis, it was more driven by the architecture teams here, for us to try it out and see how it performs.
We did go to the Amazon technical support guy who was available for us to understand the limitations and the use cases. He did help us, but we were deep into our implementation when we went to him so we could not change or accommodate because we were almost at the end of the implementation. But, yes his inputs were definitely valuable for us to understand Kinesis better.
How was the initial setup?
In terms of initial setup, Kinesis is available for us to use. All we need to do is see what stack we are using. For example, our stack consists of a Lambda, Kinesis stream, DynamoDB, and some data source that is probably another Lambda or something. So Lambda feeds data into Kinesis and Kinesis publishes it into another Lambda. I'm just giving an example. All these four components come under a certain stack so there's not much to set up other than ensuring that it's part of a used CloudFormation for ensuring that we maintain stacks separately. Kinesis had to be part of the stack and data CloudFormation stack template and also it needs permissions from the data source of both source and destination. We just need to give permission to those data sources to be able to access Kinesis. Other than that, there's nothing much to set up because Kinesis is a self managed service.
What about the implementation team?
We were four developers and one principal developer who were taking us from the architecture standpoint during setup.
What's my experience with pricing, setup cost, and licensing?
I think there is a paid version only, there is no free version. I think it is possibly on the expensive side.
I did not go too deep into pricing, because our business did not care about pricing that much. They just wanted the product to be solid and level at all times. The business is generally conservative about services and pricing. But, this was a different case for us where the price did not matter.
I did not explore that much into the pricing of Kinesis, per se.
Which other solutions did I evaluate?
I'm aware of Costco streaming, but I have not used it in any project. This was the only streaming service that I used.
Here, we mostly use Azure Web Apps, Azure Web Jobs and the function apps, which are similar to Lambda. The exposure that I'm seeing is not as extensive here. It is not as extensive as it was for me in my previous organization. In the previous organization the entire infrastructure was on cloud, but here in my current organization it's partially on cloud. So the exposure into many Azure services is limited at this point.
What other advice do I have?
With my limited exposure to Kinesis, and with the pain points and probably not using it properly, we did see that it was successful. Having said all that, and the pain points that we went through, on a scale of one to ten I would give Kinesis an eight out of 10.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Amazon Web Services (AWS)