We primarily use the solution to integrate very large data sets from another environment, such as our SQL environment, and draw purposeful data before checking it. We also use the solution for streaming very very large servers.
We primarily use the solution to integrate very large data sets from another environment, such as our SQL environment, and draw purposeful data before checking it. We also use the solution for streaming very very large servers.
It is a very fast solution. It's very easy to use. There are many RPis with many languages like Scala, Java, R, and Python. The greatest advantage of Spark is that we can initiate many kinds of analytics including SQL analytics, graphics analytics, etc.
The solution needs to optimize shuffling between workers.
I've been using the solution for four or five years.
The solution is very stable.
The solution is scalable. My understanding is version 3.0 has renewed scaling capabilities and will be able to do so automatically.
Apache is an open-source platform so there is no technical support.
We use both on-premises and public and private cloud deployment models. We're partners with Databricks.
I'm a consultant. Our company works for large enterprises such as banks and energy companies. 17 of our workers use Apache Spark.
With the cloud, there are many companies that integrate Spark. Most projects in big data around the world use Spark, indirectly or directly.
I'd rate the solution eight out of ten.