What is our primary use case?
When we receive data from the messaging queue, we process everything using Apache Spark. Data Bricks does the processing and sends back everything the Apache file in the data lake. The machine learning program does some kind of analysis using the ML prediction algorithm.
What is most valuable?
The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.
What needs improvement?
There are lots of items coming down the pipeline in the future. I don't know what features are missing. From my point of view, everything looks good.
The graphical user interface (UI) could be a bit more clear. It's very hard to figure out the execution logs and understand how long it takes to send everything. If an execution is lost, it's not so easy to understand why or where it went. I have to manually drill down on the data processes which takes a lot of time. Maybe there could be like a metrics monitor, or maybe the whole log analysis could be improved to make it easier to understand and navigate.
There should be more information shared to the user. The solution already has all the information tracked in the cluster. It just needs to be accessible or searchable.
For how long have I used the solution?
I started using the solution about four years ago. However, it's been on and off since then. I would estimate in total I have about a year and a half of experience using the solution.
What do I think about the stability of the solution?
The stability of the solution is very, very good. It doesn't crash or have glitches. It's quite reliable for us.
What do I think about the scalability of the solution?
The scalability of the solution is very good. If a company has to expand it, they can do so.
Right now, we have about six or seven users that are directly on the product. We're encouraging them to use more data. We do plan to increase usage in the future.
How are customer service and technical support?
I'm a developer, so I don't interact directly with technical support. I can't speak to the quality of their service as I've never directly dealt with them.
Which solution did I use previously and why did I switch?
We did previously use a lot of different mechanisms, however, we needed something that was good at processing data for analytical purposes, and this solution fit the bill. It's a very powerful tool. I haven't seen other tools that could do precisely what this one does.
How was the initial setup?
The initial setup isn't too complex. It's quite straightforward.
We use CACD DevOps from deployment. We only use Spark for processing and for the Data Bricks cluster to spin off and do the job. It's continuously running int he background.
There isn't really any maintenance required per se. We just click the button and it comes up automatically, with the whole cluster and the Spark and everything ready to go.
What's my experience with pricing, setup cost, and licensing?
I'm unsure as to how much the licensing is for the solution. It's not an aspect of the product I deal with directly.
What other advice do I have?
We're customers and also partners with Apache.
While we are on version 2.6, we are considering upgrading to version 3.0.
I'd rate the solution nine out of ten. It works very well for us and suits our purposes almost perfectly.
Which deployment model are you using for this solution?
Which version of this solution are you currently using?