Apache Spark Review

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.


What is most valuable?

It allows the loading and investigation of very lard data sets, has MLlib for machine learning, Spark streaming, and both the new and old dataframe API.

How has it helped my organization?

We're able to perform data discovery on large datasets without too much difficulty.

What needs improvement?

It needs better documentation as well as examples for all the Spark libraries. That would be very helpful in maximizing its capabilities and results.

For how long have I used the solution?

I've used it for over nine months now.

What was my experience with deployment of the solution?

I haven't encountered any issues with deployment.

What do I think about the stability of the solution?

There have been no stability issues.

What do I think about the scalability of the solution?

I haven't had any scalability issues. It scales better than Python and R.

How are customer service and technical support?

Customer Service:

I haven't had to use customer service.

Technical Support:

I haven't had to use technical support.

Which solution did I use previously and why did I switch?

I previously used Python and R, but neither of these scaled particularly well.

How was the initial setup?

The initial setup was complex. It was not easy getting the correct version and dependencies set up.

What about the implementation team?

I implemented it in-house on my own!

What was our ROI?

It's open-source, so ROI is inapplicable.

What other advice do I have?

Learn Scala as this will greatly reduce the pain in starting off with Spark.

**Disclosure: I am a real user, and this review is based on my own experience and opinions.
More Apache Spark reviews from users
...who work at a Financial Services Firm
...who compared it with Amazon EMR
Add a Comment
Guest