Apache Spark Review

Spark provides lots of high-level APIs, which reduces duplication of work.

Valuable Features

Streaming data processing

Improvements to My Organization

In the previous version, we use Storm to handle real-time data, however its performance doesn't meet the requirement. Spark Streaming's micro-batch mode helps improving performance. Also, Spark provides lots of high-level APIs, which reduces duplication of work.

Room for Improvement

Better monitoring ability. Especially monitoring integration with customer codes.

Use of Solution

I've used it for one year.

Stability Issues

We met some standalone deployment issues, which showed that its stability is not that good. So we plan to switch to Yarn or Mesos mode

Customer Service and Technical Support

I have to say it is bad. I can only ask for help in the Google group. However, it is run in the developer-for-developer style. There are almost no people from databricks. I also use a Cassandra-Spark-connector, and Datastax has at least one dedicated person to help the community.

Initial Setup

Not that straightforward in terms of standalone deployment, there are some tricks which are not mentioned in the docs.

Implementation Team

We did it in-house.

Pricing, Setup Cost and Licensing

So far we have no plan to switch to commercial license.

Other Advice

I love Spark over other solutions.

Which version of this solution are you currently using?

**Disclosure: I am a real user, and this review is based on my own experience and opinions.
More Apache Spark reviews from users
...who work at a Financial Services Firm
...who compared it with Amazon EMR
Learn what your peers think about Apache Spark. Get advice and tips from experienced pros sharing their opinions. Updated: July 2021.
521,637 professionals have used our research since 2012.
Add a Comment
ITCS user