Apache Spark Review

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.


Valuable Features

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.

Improvements to My Organization

We're able to perform data discovery on large datasets without too much difficulty.

Room for Improvement

It needs better documentation as well as examples for all the Spark libraries. That would be very helpful in maximizing its capabilities and results.

Use of Solution

I've used it for over nine months now.

Deployment Issues

I haven't encountered any issues with deployment.

Stability Issues

There have been no stability issues.

Scalability Issues

I haven't had any scalability issues. It scales better than Python and R.

Customer Service and Technical Support

Customer Service:

I haven't had to use customer service.

Technical Support:

I haven't had to use technical support.

Previous Solutions

I previously used Python and R, but neither of these scaled particularly well.

Initial Setup

The initial setup was complex. It was not easy getting the correct version and dependencies set up.

Implementation Team

I implemented it in-house on my own!

ROI

It's open-source, so ROI is inapplicable.

Other Advice

Learn Scala as this will greatly reduce the pain in starting off with Spark.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Guest

Sign Up with Email