We just raised a $30M Series A: Read our story

Amazon Kinesis Competitors and Alternatives

Get our free report covering Apache, Apache, Amazon, and other competitors of Amazon Kinesis. Updated: October 2021.
542,029 professionals have used our research since 2012.

Read reviews of Amazon Kinesis competitors and alternatives

Mohammad Masudu Rahaman
Founder at Talkingdeal.com LLC
Real User
Top 10
Good logging mechanisms, a strong infrastructure and pretty scalable

Pros and Cons

  • "There are a lot of options in Spring Cloud. It's flexible in terms of how we can use it. It's a full infrastructure."
  • "The configurations could be better. Some configurations are a little bit time-consuming in terms of trying to understand using the Spring Cloud documentation."

What is our primary use case?

Mostly the use cases are related to building a data pipeline. There are multiple microservices that are working in the Spring Cloud Data Flow infrastructure, and we are building a data pipeline, mostly a step-by-step process processing data using Kafka. Most of the processor sync and sources are being developed based on the customers' business requirements or use cases. 

In the example of the bank we work with, we are actually building a document analysis pipeline. There are some defined sources where we get the documents. Later on, we extract some information united from the summary and we export the data to multiple destinations. We may export it to the POGI Database, and/or to Kafka Topic. 

For CoreLogic, we were actually doing data import to elastic. We had a BigQuery data source. And from there we did some transformation of the data then imported it in the elastic clusters. That was the ETL solution.

How has it helped my organization?

For example, like PCF, all the cloud services, has their own microservice management infrastructure. However, if you have a CDF running, then the developer has more control over the messaging platform. How we can control the data flowing from one microservice to another microservice is great. As a developer, I feel more in control. Some hosted services (like the cloud) or some hosted infrastructure make us run smaller microservices, but they are actually infrastructure dependent. If anything happens (like any bug or any issue), it can be difficult to trace the problem. That's not true here. In a CDF they are really good at logging. Therefore, as a developer, I can have my Spring Boot logging mechanism to check what the problem is and it helps a lot. 

I've been working with the solution for eight or nine years at this point. I feel more comfortable with the infrastructure. CDF is actually infrastructure for Spring Boot applications running inside it. As a task or as the Longleaf microservice in the data pipeline. If you have a Spring Cloud Data Flow server implemented in your project, that means you have your own data pipeline architecture, and you can design your flow of the processing of the data as you wish. 

There is also logging for these short leading tasks. When the task is started, when the task is stopped, this kind of logging also helps to get some more transparency. 

In terms of the direct benefit of the company, they spend less money due to the fact that if you have some kind of hosted BPM or some kind of hosted service to orchestrate your microservices, then you need to pay some fees to a company to manage it. However, if your developer can manage the CDF, then this management cost gets reduced. I'm not sure of the actual hard costs, however, I am aware of the savings.

What is most valuable?

Mostly we enjoy the orchestration of microservices as you can have a Spring Boot application and build your own steps. You can deal with multiple processors as you need. There is a Spring Task inside CDF. That task is also helpful for a temporary position. You can trigger some tasks and it will do something in a few microseconds and then finish the task. There is no memory occupied by running the microservice. You can just open the microservice and it will do some work and then it will die and memory is released. These kinds of temporary activities are also helpful. 

It's a low-resource type of product. You have a scheduler running, and you have a lot of smaller tasks to be done by the Scheduler. Therefore, you don't need to keep the microservice running. You can trigger the task and the task will be executed and it will be down and GAR execution will be down and then memory will be released. So you don't ever need to keep any long life microservices.

There are a lot of options in Spring Cloud. It's flexible in terms of how we can use it. It's a full infrastructure.

What needs improvement?

The configurations could be better. Some configurations are a little bit time-consuming in terms of trying to understand using the Spring Cloud documentation. 

The documentation on offer is not that good. Spring Cloud Data Flow documentation for the configurations is not exactly clear. Sometimes they provide some examples, which are not complete examples. Some parts are presented in the documentation, but not shown in an example code. When we try to implement multiple configurations, for example, when we integrated with PCF, Pivotal Cloud Foundry, with CDF, there were issues.  It has workspace concept, however, in a CDF when we tried to implement the workspace some kind of boundary configuration was not integrating properly. Then we went to the documentation and tried to somehow customize it a little bit on the configuration level - not in the code level - to get the solution working.

It is open source. Therefore, you need to work a little bit. You need to do some brainstorming on your own. There's no one to ask. We cannot call someone and ask what the problem is. It is an open-source project without technical support. It's up to us to figure out what the problem is.

For how long have I used the solution?

I've been working with the solution for more than 11 months on two separate projects in California and Illinois. However, I have been familiar with the solution since 2017 and have used it on and off since then on a variety of projects.

What do I think about the stability of the solution?

Spring Cloud Data Flow is an open-source project and a lot of developers are working on this project. It is really stable right now. The configuration part may need some improvement, or, rather, simplifying in that some configuration could be simplified somehow. For a simpler implementation or a smaller project, there is no problem. If you deploy in PCF it is the CDF server, and if you deploy in Kubernetes it is the CDF server, then there are some integrations. 

What do I think about the scalability of the solution?

The solution scales well. 

The main reason to use the Spring Cloud Data Flow server is for scaling your project. You can split it into multiple microservices, then you can deploy it into multiple servers. We took help from the PCF platform as PCF has a Pivotal Cloud Foundry. They have Spring Cloud Data Flow server integrated right in. In their cluster, our microservice was running, however, it was running in multiple instances. We can increase the number of instances of these microservices as we need. 

How are customer service and technical support?

The solution is open-source, so there really isn't technical support to speak of. If there are issues, we need to troubleshoot them ourselves. We need to go through the code and work through the issues independently.

Which solution did I use previously and why did I switch?

We've had experience with Apache 95 and also Spark, however, Spark is just an execution engine mostly. They also have similar architecture. Apache 95, like this solution, is also open-source. We've looked at Amazon Step Function, however, their concept is similar to a serverless architecture. You don't need to even do the, boilerplate coding to run the application as a microservice. You just copy the part of the code you need to execute as a function. In ACDF what we do, we write microservice as application double application, then run that code inside my microservice, we've had some method, however, in AWS, Amazon Step Function, lambda, you can only put the part of the good that you need to execute, then use their platform to connect all the steps. Amazon can be expensive as you do need to pay for their services. The others you can just install on your servers.

How was the initial setup?

During the initial setup, when I ran the CDF server (just one GAR then Skipper server another GAR), I created some tasks and created a source string with an ITA service string. These tasks are all simple. However, if we try to integrate with some kind of platform, for example, another platform where I'm going to deploy a CDF, then the complexity comes into play. Otherwise, if you can run it in a single ECS or any kind of Linux box or in a server instance. Then there no issue. You can do everything.

I used the Docker compass and we did Docker-ize a lot of things. It was a quick deployment.

That said, each deployment is unique to each client. It's not always the same steps across the board.

What other advice do I have?

While the deployment is on-premises, the data center is not on-premises. It's in a different geographical location, however, it was the client's own data center. We deployed there, and we installed the CDF server, then the Skipper server, and everything else including all the microservices. We used the PCF Cloud Foundry platform and for the bank, we deployed in Kubernetes. 

Spring Cloud Data Flow server is pretty standard to implement. The year before it was a new project, however, now it is already implemented in many, many projects. I think developers should start using it if they are not using it yet. In the future, there could be some more improvements in the area of the data pipeline ETF process. That said, I'm happy with the Spring Cloud Data Flow server right now.

Our biggest takeaway has been to  design the pipeline depending on the customer's needs. We cannot just think about everything as a developer. Sometimes we need to think about what the customer needs instead. Everything needs to be based on customer flow. That helps us design a proper data pipeline. The task mechanism is also helpful if we can run some tasks instead of keeping the application live 24 hours. 

Overall, I'd rate the solution nine out of ten. It's a really good solution and a lot cheaper than a lot of infrastructure provided by big companies like Google or Amazon.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
SR
Associate Principal Analyst at a computer software company with 10,001+ employees
Real User
Top 20
Helpful technical support and relatively easy to set up but is not cloud agnostic

Pros and Cons

  • "Technical support is pretty helpful."
  • "Early in the process, we had some issues with stability."

What is our primary use case?

We were doing some level of stream data processing, so we had some use cases which were related to IoT. We had some IoT devices getting data in from other IoT devices and Azure Streaming Analytics has a special streaming analytics offering for IoT devices. Basically it was used for that. 

What is most valuable?

I basically use two features that are useful. One is Azure Event Hubs, and that is used in conjunction with Azure Streaming Analytics. One is the broker and one is the processing engine. With the processing engine, the SQL way of dealing with things, with streams, is what I like, compared to other solutions, which are more like Scala or Spark-based, where you need to know the language. This was comparatively easy to use with its ability to write SQL on streams.

Technical support is pretty helpful. 

It's my understanding that the setup is pretty straightforward.

What needs improvement?

With Azure specifically, the drawback is it is a very Azure-specific product. You can't connect it to external things out of Azure. For example, Spark or Databricks can be used in any cloud and can be used in AWS. This product doesn't work that way and is very Azure-specific. It's not a hybrid solution and it's not a cloud-agnostic solution, where you put it on other clouds, et cetera. 

We had some connections which we wanted to make with AWS, which we couldn't do with this. We had to use something else for that.

Early in the process, we had some issues with stability.

You cannot do joins on streams of data. For example, one stream joining with another stream. Real-time to real-time joins, you're not able to do that. You can only join your stream with static data from your Azure storage. 

For how long have I used the solution?

I've used the solution for one and a half to two years.

What do I think about the stability of the solution?

There were some issues with the IoT jobs when streaming Azure Streaming Analytics, which are high proof now. That said, earlier, we used to have a lot of issues with the erratic behavior of jobs. If data is not in the way they expect it, if they are not modeled correctly, then the jobs tend to break or fail quite a lot. That was one issue we had.

How are customer service and technical support?

We've been in touch with technical support. There was a time when jobs failed a lot and we couldn't understand the reason. When we talked to the spec tech support, they've looked into our data and told us that it's not exactly modeled as how Azure Stream Analytics needs it. That wasn't very clear when we got it. 

They were helpful. There were issues which they handled, which they told us about. The communication was great.

We had the support package included.

Which solution did I use previously and why did I switch?

I'm now an analyst, so I don't use the products per se, however, prior to this, I have used Azure Streaming Analytics quite a lot. Currently, I'm working a bit on Databricks Spark Streaming. These two are, I would say, what I have used personally.

How was the initial setup?

The product was set up before I started out, however, what I can say, having set up some things personally, is it is comparatively straightforward and the Microsoft support on that is comparatively good.

What's my experience with pricing, setup cost, and licensing?

In terms of pricing, you can't compare it to open source solutions. It would be higher compared to open source, of course, however, with the support and everything you're getting, I would say the price, in general, is fair. 

I have seen AWS as well and can compare it to that and I would say it is fair. The problem is it is not exactly dynamic or serverless, with how the way things are in the cloud. Therefore, it is not completely utilized. You have to set up things beforehand with some level of capability and capacity beforehand. In regards to the price, it's not too high and also not too low.

Their pricing is not exactly serverless. It's per hour. A lot of others are moving towards pricing based on the amount of data you pull. Streaming Analytics charges per hour, and in that sense, you need to set up the capacity by trial and error, literally. 

Which other solutions did I evaluate?

I'm comparing the Azure Stream Analytics, AWS Kinesis, GCP Pub/Sub, and Dataflow. So I'm currently in the process of writing that research.

What other advice do I have?

If you are in the Azure world completely, and you're using the Microsoft stack completely, and you do not have the need to go in any other cloud, then it makes sense to use this solution as it integrates very well within the Azure ecosystem. 

For IoT use cases, if you want to do real-time dashboarding with Power BI, it's great. Those kinds of things are where it has its niche. However, if you want a cloud-agnostic kind of solution, where you do not want to be stuck with just Microsoft, then there are other solutions out there such as Confluent, Kafka, Spark Streaming with Databricks, et cetera. You'll get the flexibility you need using any of those platforms.

I'd rate the solution at a seven out of ten. We had some issues with the jobs not behaving properly. They promise a lot, however, sometimes that doesn't happen and we realized that later. Some things under the hood, we couldn't really understand and we needed to get in touch with support. Those kinds of issues are where I would say it needs a bit of improvement, and maybe that's why I cut off two or three points.

Which deployment model are you using for this solution?

Public Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Flag as inappropriate
Get our free report covering Apache, Apache, Amazon, and other competitors of Amazon Kinesis. Updated: October 2021.
542,029 professionals have used our research since 2012.