How do you or your organization use this solution?
Please share with us so that your peers can learn from your experiences.
My division works with Big Data and Data Science, and Databricks is one of the tools for Big Data that we work with. We are partners with Microsoft and we began working with this solution for one specific project in the financial industry.
I am a data scientist here and that is my official role. I own the company. Our team is quite small at this point. We have around five people on the team and we are working with about five different businesses. The projects we get from them are massive undertakings. Each of us on the team takes multiple roles in our company and we use multiple tools to help best serve our clients. We are trying to look at creative ways that different solutions can be integrated and we try to understand what products we can use to create solutions for client companies that will be effective in meeting their needs. We are personally using Databricks for certain projects where we want to consider creating intelligent solutions. I have been working on Databricks as part of my role in this company, trying to see if there are any kind of standard products that we can use with it to create solutions. We know that Databricks integrates with Airflow, so that is something that we are exploring right now as a potential solution for enabling a creative response. We are exploring the cloud as an option. Databricks is available in Azure and we are currently figuring out the viability of using that as a cloud platform. So we are exploring the way Databricks and Azure integrate at the same time to give us this type of flexibility. What we use it for right now is more like asset management. If we have a lot of assets and we get a lot of real-time data, we certainly want to do some processing on some of this data, but you do not want to have to work on all of it in real-time. That is why we use Databricks. We push the data from Azure through Databricks and work on the data algorithm in Databricks and execute it from Azure with probably an RPA (Robotic Process Automation) or something of that sort. It intelligently offloads real-time processing.
We are still exploring the solution. We utilize it much, much better than their star schema models that they are trying to replace it with. We bring in Databricks and then see how they can leverage the additional analytical functionalities around the Databricks cloud. It's more in exploratory ways. We recommend Databricks, especially with the Azure cloud frameworks.
We use the solution for multiple items. We use lots of data crunching, development, and algorithms on it.
We are building internal tools and custom models for predictive analysis. We are currently building a platform where we can integrate multiple data sources, such as data that is coming from Azure, AWS, or any SQL database. We integrate the data and run our models on top of that. We primarily use Databricks for data processing and for SQL databases.
We primarily use the solution to run current jobs; to run the spark jobs as the current job.
We use this solution for streaming analytics. We use machine learning functions that output to the API and work directly with the database.
I am a developer and I do a lot of consulting using Databricks. We have been primarily using this solution for ETL purposes. We also do some migration of on-premises data to the cloud.
Our primary use case is really DevOps, for integration and continuous development. We've combined our database with some components from Azure to deploy elements in Sandbox for our data scientists and for our data engineers.
We are using this solution to run large analytics queries and prepare datasets for SparkML and ML using PySpark. We ran on multiple clusters set up for a minimum of three and a maximum of nine nodes having 16GB RAM each. For one ad hoc requirement, a 32-node cluster was required. Databricks clusters were set for autoscaling and to time out after forty minutes of inactivity. Multiple users attached their notebooks to a cluster. When some workloads required different libraries, a dedicated cluster was spun up for that user.