What is our primary use case?
In my last project, I worked on Spring Cloud Data Flow (SCDF). We created a stream using this product and we had a Spring Kafka Binder as well. The project included creating a data lake for our clients.
The platform that we created maintained a data lake for an internet banking user and provided an out-of-the-box solution for integration with it. We used SCDF to gather the data, as well as our ETL (extract, transform, and load) pipelines.
What is most valuable?
The most valuable feature is real-time streaming.
It integrates very well with Kafka. The integration of Elasticsearch Appian was indeed very good because we just attached Appian to a pipeline. We had an Elasticsearch cloud, on-premises, so we were able to connect to the data.
It is open-source and has rich community support.
What needs improvement?
Some of the features, like the monitoring tools, are not very mature and are still evolving. With some of the products we used, they did not integrate well and were hanging a lot. One of the advantages of using open-source is that if you don't like a particular tool then you can use another one.
If you want to use Kubernetes then you have to optimize a lot in terms of resources. I had a 15 GB MacBook Pro, but initially, it wouldn't work because it would hang. There were also some weird shutdowns. We weren't able to figure out exactly why it happened but it was clearly due to having not enough system resources. When then needed to optimize and increase our heap memory.
For how long have I used the solution?
We used this product for almost six months in my previous company.
How are customer service and technical support?
This product has a rich support community.
What's my experience with pricing, setup cost, and licensing?
This is an open-source product that can be used free of charge.
What other advice do I have?
We used this product with Kubernetes, which had been recently introduced and we liked it. It was very good, compared to Maven. We did try it with Maven; however, the server took 15 or 16 minutes to start. This is when we switched to Kubernetes and it was very good. They provide a lot of different configurations and environment types. We use Kafka on Kubernetes, as well. The configured was proved by SCDF.
I would rate this solution a seven out of ten.
Which deployment model are you using for this solution?