Apache Spark Review

We are able to solve problems, e.g., reporting on big data, that we were not able to tackle in the past.


What is most valuable?

The good performance. The nice graphical management console. The long list of ML algorithms.

How has it helped my organization?

We are able to solve problems, e.g., reporting on big data, that we were not able to tackle in the past.

What needs improvement?

Apache Spark provides very good performance The tuning phase is still tricky.

For how long have I used the solution?

I've used it for 2 years.

What was my experience with deployment of the solution?

We didn't have an issue with the deployment.

What do I think about the stability of the solution?

In the past we deployed Spark 1.3 to use Spark SQL but unfortunately one of our queries failed because of a bug fixed in following releases. Then we moved to Spark 1.6 but still some queries were failing when run against huge datasets. Now we are using version 2.1: it is more stable, it ensures better performances and the SQL/ML parts are reacher than before.

What do I think about the scalability of the solution?

I've had no issues with the scalability.

How is customer service and technical support?

Customer Service:

I've never had to use customer service.

Technical Support:

I've never had to use technical support.

How was the initial setup?

The initial set-up is quite complex because you have to set-up many different configuration parameters that are deployment-specific. It is not trivial to set-up the correct configuration with so many variables involved.

What about the implementation team?

In-house team. The setup itself is not a problem when you have just to test the system. The challenging part is discovering the optimal configuration needed to obtain a production system proving good performance.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest
Sign Up with Email