Apache Spark Review

Helped us reduce 3TB Google Ngrams in hours instead of days

What is most valuable?

The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics. The community is growing and hence executing ML in a distributed fashion is quite good.

How has it helped my organization?

Previously we were using Hadoop MapReduce to reduce the Google Ngrams (3TB), which took us approximately five days on our cluster. After using Spark, we were able to accomplish this task within hours.

What needs improvement?

This product is already improving as the community is developing it rapidly. More ML based algorithms should be added to it, to make it algorithmic-rich for developers.

For how long have I used the solution?

Two and a half years.

What do I think about the stability of the solution?

No, I did not encounter any problems with the stability. It is also quite backwards compatible.

What do I think about the scalability of the solution?

No I did not as of now, it is quite scalable. Using simple scripts you can add as many workers as you want.

What other advice do I have?

This is a very good product for the big data analytics and integrates well with other parts like Machine Learning and graph analytics.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
1 visitor found this review helpful
Add a Comment
Sign Up with Email