Apache Spark Review

We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.


Valuable Features:

\Spark Streaming, Spark SQL and MLib in that order.

Improvements to My Organization:

We have been using Spark to do a lot of batch and stream processing of inbound data from Apache Kafka. Scaling Spark on YARN is still an issue but we are getting acceptable performance.

Room for Improvement:

Like I said scalability is still an issue, also stability. Spark on Yarn still doesn't seem to have programming submission api, so have to rely on spark-submit script to run jobs on YARN. Scala vs Java API have performance differences which will require sometimes to code in Scala.

Other Advice:

Have Scala developers at hand. Base Java competency will not be enough during optimization rounds.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest

Sign Up with Email