We just raised a $30M Series A: Read our story

Google Cloud Dataflow Competitors and Alternatives

Get our free report covering Apache, Apache, Amazon, and other competitors of Google Cloud Dataflow. Updated: October 2021.
541,462 professionals have used our research since 2012.

Read reviews of Google Cloud Dataflow competitors and alternatives

Mohammad Masudu Rahaman
Founder at Talkingdeal.com LLC
Real User
Top 10
Good logging mechanisms, a strong infrastructure and pretty scalable

Pros and Cons

  • "There are a lot of options in Spring Cloud. It's flexible in terms of how we can use it. It's a full infrastructure."
  • "The configurations could be better. Some configurations are a little bit time-consuming in terms of trying to understand using the Spring Cloud documentation."

What is our primary use case?

Mostly the use cases are related to building a data pipeline. There are multiple microservices that are working in the Spring Cloud Data Flow infrastructure, and we are building a data pipeline, mostly a step-by-step process processing data using Kafka. Most of the processor sync and sources are being developed based on the customers' business requirements or use cases. 

In the example of the bank we work with, we are actually building a document analysis pipeline. There are some defined sources where we get the documents. Later on, we extract some information united from the summary and we export the data to multiple destinations. We may export it to the POGI Database, and/or to Kafka Topic. 

For CoreLogic, we were actually doing data import to elastic. We had a BigQuery data source. And from there we did some transformation of the data then imported it in the elastic clusters. That was the ETL solution.

How has it helped my organization?

For example, like PCF, all the cloud services, has their own microservice management infrastructure. However, if you have a CDF running, then the developer has more control over the messaging platform. How we can control the data flowing from one microservice to another microservice is great. As a developer, I feel more in control. Some hosted services (like the cloud) or some hosted infrastructure make us run smaller microservices, but they are actually infrastructure dependent. If anything happens (like any bug or any issue), it can be difficult to trace the problem. That's not true here. In a CDF they are really good at logging. Therefore, as a developer, I can have my Spring Boot logging mechanism to check what the problem is and it helps a lot. 

I've been working with the solution for eight or nine years at this point. I feel more comfortable with the infrastructure. CDF is actually infrastructure for Spring Boot applications running inside it. As a task or as the Longleaf microservice in the data pipeline. If you have a Spring Cloud Data Flow server implemented in your project, that means you have your own data pipeline architecture, and you can design your flow of the processing of the data as you wish. 

There is also logging for these short leading tasks. When the task is started, when the task is stopped, this kind of logging also helps to get some more transparency. 

In terms of the direct benefit of the company, they spend less money due to the fact that if you have some kind of hosted BPM or some kind of hosted service to orchestrate your microservices, then you need to pay some fees to a company to manage it. However, if your developer can manage the CDF, then this management cost gets reduced. I'm not sure of the actual hard costs, however, I am aware of the savings.

What is most valuable?

Mostly we enjoy the orchestration of microservices as you can have a Spring Boot application and build your own steps. You can deal with multiple processors as you need. There is a Spring Task inside CDF. That task is also helpful for a temporary position. You can trigger some tasks and it will do something in a few microseconds and then finish the task. There is no memory occupied by running the microservice. You can just open the microservice and it will do some work and then it will die and memory is released. These kinds of temporary activities are also helpful. 

It's a low-resource type of product. You have a scheduler running, and you have a lot of smaller tasks to be done by the Scheduler. Therefore, you don't need to keep the microservice running. You can trigger the task and the task will be executed and it will be down and GAR execution will be down and then memory will be released. So you don't ever need to keep any long life microservices.

There are a lot of options in Spring Cloud. It's flexible in terms of how we can use it. It's a full infrastructure.

What needs improvement?

The configurations could be better. Some configurations are a little bit time-consuming in terms of trying to understand using the Spring Cloud documentation. 

The documentation on offer is not that good. Spring Cloud Data Flow documentation for the configurations is not exactly clear. Sometimes they provide some examples, which are not complete examples. Some parts are presented in the documentation, but not shown in an example code. When we try to implement multiple configurations, for example, when we integrated with PCF, Pivotal Cloud Foundry, with CDF, there were issues.  It has workspace concept, however, in a CDF when we tried to implement the workspace some kind of boundary configuration was not integrating properly. Then we went to the documentation and tried to somehow customize it a little bit on the configuration level - not in the code level - to get the solution working.

It is open source. Therefore, you need to work a little bit. You need to do some brainstorming on your own. There's no one to ask. We cannot call someone and ask what the problem is. It is an open-source project without technical support. It's up to us to figure out what the problem is.

For how long have I used the solution?

I've been working with the solution for more than 11 months on two separate projects in California and Illinois. However, I have been familiar with the solution since 2017 and have used it on and off since then on a variety of projects.

What do I think about the stability of the solution?

Spring Cloud Data Flow is an open-source project and a lot of developers are working on this project. It is really stable right now. The configuration part may need some improvement, or, rather, simplifying in that some configuration could be simplified somehow. For a simpler implementation or a smaller project, there is no problem. If you deploy in PCF it is the CDF server, and if you deploy in Kubernetes it is the CDF server, then there are some integrations. 

What do I think about the scalability of the solution?

The solution scales well. 

The main reason to use the Spring Cloud Data Flow server is for scaling your project. You can split it into multiple microservices, then you can deploy it into multiple servers. We took help from the PCF platform as PCF has a Pivotal Cloud Foundry. They have Spring Cloud Data Flow server integrated right in. In their cluster, our microservice was running, however, it was running in multiple instances. We can increase the number of instances of these microservices as we need. 

How are customer service and technical support?

The solution is open-source, so there really isn't technical support to speak of. If there are issues, we need to troubleshoot them ourselves. We need to go through the code and work through the issues independently.

Which solution did I use previously and why did I switch?

We've had experience with Apache 95 and also Spark, however, Spark is just an execution engine mostly. They also have similar architecture. Apache 95, like this solution, is also open-source. We've looked at Amazon Step Function, however, their concept is similar to a serverless architecture. You don't need to even do the, boilerplate coding to run the application as a microservice. You just copy the part of the code you need to execute as a function. In ACDF what we do, we write microservice as application double application, then run that code inside my microservice, we've had some method, however, in AWS, Amazon Step Function, lambda, you can only put the part of the good that you need to execute, then use their platform to connect all the steps. Amazon can be expensive as you do need to pay for their services. The others you can just install on your servers.

How was the initial setup?

During the initial setup, when I ran the CDF server (just one GAR then Skipper server another GAR), I created some tasks and created a source string with an ITA service string. These tasks are all simple. However, if we try to integrate with some kind of platform, for example, another platform where I'm going to deploy a CDF, then the complexity comes into play. Otherwise, if you can run it in a single ECS or any kind of Linux box or in a server instance. Then there no issue. You can do everything.

I used the Docker compass and we did Docker-ize a lot of things. It was a quick deployment.

That said, each deployment is unique to each client. It's not always the same steps across the board.

What other advice do I have?

While the deployment is on-premises, the data center is not on-premises. It's in a different geographical location, however, it was the client's own data center. We deployed there, and we installed the CDF server, then the Skipper server, and everything else including all the microservices. We used the PCF Cloud Foundry platform and for the bank, we deployed in Kubernetes. 

Spring Cloud Data Flow server is pretty standard to implement. The year before it was a new project, however, now it is already implemented in many, many projects. I think developers should start using it if they are not using it yet. In the future, there could be some more improvements in the area of the data pipeline ETF process. That said, I'm happy with the Spring Cloud Data Flow server right now.

Our biggest takeaway has been to  design the pipeline depending on the customer's needs. We cannot just think about everything as a developer. Sometimes we need to think about what the customer needs instead. Everything needs to be based on customer flow. That helps us design a proper data pipeline. The task mechanism is also helpful if we can run some tasks instead of keeping the application live 24 hours. 

Overall, I'd rate the solution nine out of ten. It's a really good solution and a lot cheaper than a lot of infrastructure provided by big companies like Google or Amazon.

Which deployment model are you using for this solution?

On-premises
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Alwin George Daniel
RPA DevOps Engineer at SG Analytics
Real User
Top 10
Effective Blob storage and the IoT hub save us a lot of time, and the support is helpful

Pros and Cons

  • "The most valuable features are the IoT hub and the Blob storage."
  • "There may be some issues when connecting with Microsoft Power BI because we are providing the input and output commands, and there's a chance of it being delayed while connecting."

What is our primary use case?

We have different kinds of IoT devices placed in different countries including the UK, US, and others. They are configured with our IoT hub and we get the logs from them accordingly.  We have these logs connected with the Stream Analytics suites and Microsoft Power BI. Whatever updates and other activity is happening on the devices are streamed into Azure and Power BI so that we can see them.

If we find any error messages then we have to check the health of the corresponding IoT devices, databases, and configuration.

How has it helped my organization?

This gives us a real-time monitoring system that we can use to analyze the health of our IoT devices. Previously, when something was not working properly then we would receive messages in our email using the TeamWork application. Now, instead of checking email, we receive an alert ping that we can hear, which allows us to evaluate how well the machine is doing. We can check the performance and other relevant metrics.

In general, it gives us more visibility in terms of what is going on. We used to receive between 10,000 and 20,000 emails per week, which was hectic for us to calculate and keep track of. Since implementing Azure, we have been able to monitor things very easily. Not only does it create an interval for the logs but it reduces the number of duplicates.

We have not eliminated the messages that come in as email, as high-priority messages are still delivered in that manner. For example, if there is a power shut-down then we will be notified via email. This is set up in case we miss these types of messages in the BI platform.

What is most valuable?

The most valuable features are the IoT hub and the Blob storage. All of the logs and other data that we are getting can be stored in Blobs.

The interface is user-friendly.

What needs improvement?

There may be some issues when connecting with Microsoft Power BI because we are providing the input and output commands, and there's a chance of it being delayed while connecting.

For how long have I used the solution?

I have been using Azure System Analytics for just more than one year.

What do I think about the stability of the solution?

This product is stable but if our VM goes down then we are not able to get a proper instance update. When this happens, we need to kill these instances. Situations like this only happen rarely.

What do I think about the scalability of the solution?

The scalability is based on the requirements. If the requirements are high then highly-scalable machines are needed. If it is more manageable then it is cheaper. I think that scaling is really about the cost.

We have a development team and an operations team that is working with Azure Steam Analytics. There are seven or eight people in the operations team. The customer also has access to the platform if they require it.

How are customer service and technical support?

If you raise a ticket with technical support then they will contact you within 24 hours. However, we have not faced many issues, so we haven't had much involvement with them.

There is a diagnostic tool available in Azure and you can check to see if you have any issues on your end. If there are problems then you can contact support for assistance.

Overall, I think that the support is very helpful.

Which solution did I use previously and why did I switch?

Since transitioning from our email-only solution, we have been able to set the interval that we use to retrieve logs from the devices.

We did not use a similar product before this one for the same purpose. The company has been using Azure since before I joined, although they had used AWS for other tasks. At this company, I have not had the opportunity to work on AWS.

How was the initial setup?

I have not completed a deployment for production purposes. Rather, I have performed a setup for training with Azure and an IoT simulator. In this case, we just check the logs during my practice session. My role in the operation was to lead the management team.

The training deployment that I completed was user-friendly and anyone can easily do it. Even as part of the operations team, I was able to capture the details and complete the deployment really quickly.

The only difficulty that I faced was connecting with the different machines in the outside layer, such as BI or Kibana. Depending on the application I was connecting with, there were issues with it.

What about the implementation team?

The deployment was done by our development team, and they are responsible for the maintenance as well. Because it is a platform as a service, Azure takes care of almost everything.

What was our ROI?

I am not familiar with the details of the investment. This is something that is handled completely by the product owner. This would be my manager or the Delivery Manager.

What's my experience with pricing, setup cost, and licensing?

The cost of this solution is less than competitors such as Amazon or Google Cloud. If we only use one hour then we are only charged for one hour. It is very easy and some products are more expensive.

What other advice do I have?

Azure Stream Analytics is something that we were able to easily learn. It doesn't take much programming sill, so I feel that it is easy to start using.

Other than the problem with delays in connecting to Microsoft BI, Kibana, or other monitoring tools, I don't have any other issues with this product.

I would rate this solution a nine out of ten.

Which deployment model are you using for this solution?

Public Cloud

If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?

Microsoft Azure
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Get our free report covering Apache, Apache, Amazon, and other competitors of Google Cloud Dataflow. Updated: October 2021.
541,462 professionals have used our research since 2012.