The good performance. The nice graphical management console. The long list of ML algorithms.
The good performance. The nice graphical management console. The long list of ML algorithms.
We are able to solve problems, e.g., reporting on big data, that we were not able to tackle in the past.
Apache Spark provides very good performance The tuning phase is still tricky.
I've used it for 2 years.
We didn't have an issue with the deployment.
In the past we deployed Spark 1.3 to use Spark SQL but unfortunately one of our queries failed because of a bug fixed in following releases. Then we moved to Spark 1.6 but still some queries were failing when run against huge datasets. Now we are using version 2.1: it is more stable, it ensures better performances and the SQL/ML parts are reacher than before.
I've had no issues with the scalability.
I've never had to use customer service.
Technical Support:I've never had to use technical support.
The initial set-up is quite complex because you have to set-up many different configuration parameters that are deployment-specific. It is not trivial to set-up the correct configuration with so many variables involved.
In-house team. The setup itself is not a problem when you have just to test the system. The challenging part is discovering the optimal configuration needed to obtain a production system proving good performance.