Apache Spark Review

​I use it to process large amount of data in the energy industry.


What is most valuable?

Spark is relatively easy to deploy, with rich features in handling big data. Spark Core, Spark SQL, Spark MLlib are used mostly in our applications.

How has it helped my organization?

I use Spark to process large amount of data in the energy industry.

What needs improvement?

Good tool to analyse Spark application performance. Right now there are still many parameters to tune in order to get good performance of Spark application, I would like to see the auto tuning of parameters.

For how long have I used the solution?

I've been using Spark for seven months.

What was my experience with deployment of the solution?

There were no issues with the deployment.

What do I think about the stability of the solution?

I ran into Spark application performance issues. For instance, Spark JDBC write performance needs to be improved.

What do I think about the scalability of the solution?

There were no issues with the scalability.

How are customer service and technical support?

Customer Service:

I use Apache open source. Everything is on our own.

Technical Support:

I use Apache open source. Everything is on our own.

Which solution did I use previously and why did I switch?

I evaluated Hadoop-based solution, and chose Spark due to the fast processing and ease of use.

How was the initial setup?

The initial setup is not complex. The online documents are pretty good.

What about the implementation team?

I implemented it in-house.

What other advice do I have?

Get to know how Spark works, what are job, stage, task, DAG, etc., and it will help you to write Spark application.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest