Apache Spark Review

It provides large scale data processing with negligible latency at the cost of commodity hardwares.

Valuable Features:

The most important feature of Apache Spark is that it provides large scale data processing with negligible latency at the cost of commodity hardwares. Spark framework is just a blessings over Hadoop, as the later does not allow fast processing of data, which is accomplished by the in-memory data processing of Spark.

Improvements to My Organization:

Apache Spark is a framework, which allows one organization to perform business & data analytics, at a very low cost, as compared to Ab-Initio or Informatica. Thus, by using Apache Spark in place of those tools, one organization can achieve huge reduction in cost, & without compromising with any data security & other data related issues, if controlled by an expert Scala programmer  & Apache Spark does not bear the overheads of Hadoop of having high latency. All these points, by which my organization is being benefitted as well.

Room for Improvement:

Question of improvement always comes to mind of the developers. Just like the most common need of the developers, if a user-friendly GUI along with 'drag & drop' feature can be attached to this framework, then it would be easier to access it.

Another thing to mention, there always is a place for improvement in terms of the memory usage. If in future, it is achievable to use less memory for processing, it would obviously be better.

Deployment Issues:

We've had no issues with deployment.

Stability Issues:

See above regarding memory usage.

Scalability Issues:

We've had no issues with scalability.

Other Advice:

My advice to others would be just to use Apache Spark for large scale data processing, as it provides good performance at low cost, unlike Ab-Initio or Informatica. But the main problem is, now in the market, there are not many people certified in Apache Spark.

**Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
1 Comment

author avatarCEO at Trendalyze Decisions Ltd

The drag and drop GUI comment is very true. We developed such a GUI for spatial and time series data in Spark. But there are other tools out there. Maybe you should do a review of such tools.