Apache Spark Review

It allows you to construct event-driven information systems.


Valuable Features

Spark Streaming, which allows you to construct event-driven information systems and respond to the events in near-real time.

Improvements to My Organization

Apache Spark’s ability to perform batch processing at one second or less intervals is the most transformative and less pervasive for any data processing application. The ingested data can also be validated and verified for quality early in the data pipeline.

Room for Improvement

Apache Spark as a data processing engine has come a long way since its inception. Although you are able to perform complex transformations using Spark libraries, the support for SQL to perform transformations is still limited. You can alleviate some of these limitations by running Spark within Hadoop ecosystem and by leveraging the fairly evolved HiveQL.

Use of Solution

I've used it for 16 months.

Deployment Issues

The enterprise scale deployment of Apache Spark is slightly involved to derive its full potential of stability, scalability and security. However, some Hadoop vendors like Cloudera have integrated Spark data processing engine into their Hadoop platforms and have made it easier to deploy, scale and secure.

Customer Service and Technical Support

This is an open source technology and is dependent on community support. The Apache Spark community is vibrant and it is easy to find answers to questions. The enterprises can also get commercial support from Hadoop vendors such as Cloudera. I recommend enterprises to inspect Hadoop vendors’ commitment to open source as well as their ability to curate Apache Spark technology into the rest of the ecosystem before signing up for a commercial support or subscription.

Initial Setup

The initial set-up is straightforward as long as you have picked a right Hadoop distribution.

Implementation Team

I recommend engaging an experienced Hadoop vendor during the planning and initial implementation phases of the project. You will be able to avoid any potential pitfalls or reduce overall project time by having a Hadoop expert guiding you during the initial stages of the project.

Other Solutions Considered

I evaluated some other technologies such as Samza but community backing for Apache Spark stood out.

Other Advice

I also suggest having a Chief Technologist who has extensive experience in architecting several Big Data solutions. They should be able to communicate in business as well as technology language. Their expertise should range from infrastructure to application development and have command of Hadoop technologies.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest
Sign Up with Email