What is most valuable?
The ability to quickly develop applications which reliably process very large volumes of time series sensor data with low latency is a critical need for us.
Although, there is a rich and growing set of available tool kits that provide specialized functionality, the time series tool kit is especially important. When used for what it was designed specifically to do, which is to process large amounts of data in motion with low latency, it is very good.
How has it helped my organization?
The product has enabled us to create solutions to client problems that would have either been impossible or very expensive/difficult using other technologies. It allows us to focus on the business logic for the applications rather than the plumbing.
What needs improvement?
I’d like to see a tool kit specifically targeted at incremental machine learning.
It’s already great for scoring previously trained models, but dynamically updating models is currently more of a 'grow your own' kind of thing.
It might also be useful to have more options for dynamically scaling the runtime environment.
For how long have I used the solution?
I’ve been actively working with Streams for more than eight years, since before it was GA.
What do I think about the stability of the solution?
The maturity of the product has resulted in an extremely stable, truly enterprise-class platform.
Issues that arise with new implementations are typically related to interfaces with external systems which can’t keep up with the throughput of Streams, or custom embedded code that wasn’t correctly implemented.
The Streams operators and runtime environment themselves are extremely reliable and stable. It’s not unusual for Streams based solutions to run continuously for years.
What do I think about the scalability of the solution?
I would consider Streams to be a cluster rather than cloud focused technology.
This allows me a great amount of control for relatively predictable workloads.
I have yet to come across a use case where the system could not be appropriately sized to accommodate the projected loads while meeting very low latency requirements. That includes some situations where the data volumes are truly enormous and continuous.
How are customer service and technical support?
One of the benefits of a commercial product is that the level of support is great. In addition to the formal issue reporting process, there is an active online community that is also monitored and contributed to by the vendor.
They also proactively evaluate evolving customer needs and are continuously enhancing the offering.
Which solution did I use previously and why did I switch?
Back in 2009 when we started using Streams, there was nothing else like it and the alternative of implementing the capability from scratch would have been a very substantial undertaking.
Since then, many commercial and open source offerings have entered the space with various levels and durations of acceptance. Depending on the requirements of the application, some of them may be an appropriate alternative.
However, for my clients and use cases, Streams provides a unique combination of stability, reliability, capability, and performance. Although, I now have more potential alternatives for performing some of the overall processing pipeline, I still depend on Streams to handle much of the 'heavy lifting'.
How was the initial setup?
Installing Streams is a straightforward exercise. However, as with any platform for large/fast data, the architectural and application design considerations should not be taken lightly. You can do stupid things with even the best of systems.
What's my experience with pricing, setup cost, and licensing?
I’m a huge believer in the open source concept. However, it’s important to consider the total cost of ownership.
Many apparent solutions that appear to be low cost on the surface end up costing more for development and operation. One of the many strengths of the Streams platform is its computational efficiency.
It’s common to be able to run the same load on much less hardware than would be required for some of the alternatives. Not only does this relate to lower recurring operational costs, but also results in more simple administration.
Which other solutions did I evaluate?
It depends on the project. The set of available options has evolved over time with many new players showing up for a while, getting hot, and then cooling off in favor of other things.
Currently, and for the foreseeable future, Streams is still my primary data-in-motion processing platform.
However, there are situations where I consider leveraging the capabilities of structured streaming available in Apache Spark 2.x, Apache Kafka, and/or Redis, sometimes together with Streams.
What other advice do I have?
Review the product information on the company’s website and download the free quick-start version to try it out for yourself.
Ask questions in the development community or on GitHub and reach out to the vendor for presales support. They have a rich set of benchmarks, cost studies, and use case stories that may be relevant to your application.
Which version of this solution are you currently using?