What is our primary use case?
We are building internal tools and custom models for predictive analysis. We are currently building a platform where we can integrate multiple data sources, such as data that is coming from Azure, AWS, or any SQL database. We integrate the data and run our models on top of that.
We primarily use Databricks for data processing and for SQL databases.
What is most valuable?
I found that PySpark is the most useful tool. It uses in-memory calculation and when you want to run a model it does it very quickly. We used to use Python and when we migrated to PySpark the performance was much better.
What needs improvement?
It would be very helpful if Databricks could integrate with platforms in addition to Azure.
Having an open-source version or having the option to get a trial version of Databricks would be very helpful.
It would be very useful for beginners if there were tutorials and examples on how to write code for PySpark, R, or Scala. Having examples would give people something to refer to and play with.
For how long have I used the solution?
We have been using Databricks for the past two or three years.
What do I think about the stability of the solution?
A couple of times I faced an issue where a long-running process was consuming a lot of time and then stopped abruptly. It necessitated starting the process again.
What do I think about the scalability of the solution?
We are in the prototyping stage so we do not plan on increasing our usage yet.
How are customer service and technical support?
We have not been in contact with technical support.
Which solution did I use previously and why did I switch?
Before using Databricks, we were running our own cluster with a web server that executed our Python queries.
How was the initial setup?
The initial setup is straightforward. With respect to deployment, the development can be done within half an hour and we can use code and deploy from there.
What about the implementation team?
We implemented Databricks on our own. We haven't deployed as such, as we are just running our queries and it is not in production yet.
What other advice do I have?
I work in the data science field and I found Databricks to be very useful. If I want to run any models then I can code them in PySpark. If you are coming from a Python background then you can write code in PySpark and it runs quickly. This is a good solution in terms of performance.
I would rate this solution a nine out of ten.
Which deployment model are you using for this solution?
If public cloud, private cloud, or hybrid cloud, which cloud provider do you use?
Disclosure: My company has a business relationship with this vendor other than being a customer: Partner