Apache Spark Review

Spark provides lots of high-level APIs, which reduces duplication of work.


Valuable Features

Streaming data processing

Improvements to My Organization

In the previous version, we use Storm to handle real-time data, however its performance doesn't meet the requirement. Spark Streaming's micro-batch mode helps improving performance. Also, Spark provides lots of high-level APIs, which reduces duplication of work.

Room for Improvement

Better monitoring ability. Especially monitoring integration with customer codes.

Use of Solution

I've used it for one year.

Stability Issues

We met some standalone deployment issues, which showed that its stability is not that good. So we plan to switch to Yarn or Mesos mode

Customer Service and Technical Support

I have to say it is bad. I can only ask for help in the Google group. However, it is run in the developer-for-developer style. There are almost no people from databricks. I also use a Cassandra-Spark-connector, and Datastax has at least one dedicated person to help the community.

Initial Setup

Not that straightforward in terms of standalone deployment, there are some tricks which are not mentioned in the docs.

Implementation Team

We did it in-house.

Pricing, Setup Cost and Licensing

So far we have no plan to switch to commercial license.

Other Advice

I love Spark over other solutions.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest

Sign Up with Email