Apache Spark Review

Having everything in the same framework has helped us out a lot

What is most valuable?

ETL and streaming capabilities.

How has it helped my organization?

Made Big Data processing more convenient and a uniform framework adds to efficiency of usage since the same framework can be used for batch and stream processing.

What needs improvement?

Stability in terms of API (things were difficult, when transitioning from RDD to DataFrames, then to DataSet).

For how long have I used the solution?

I have used Spark since its inception in March 2015, from Spark 1.1 onwards.

Currently, I use 2.2 extensively.

What do I think about the stability of the solution?

Yes, occasionally with different APIs.

What do I think about the scalability of the solution?


How is customer service and technical support?

Since we were using the Open Source version of Apache Spark, without the Databricks support, we never used technical support form Databricks.

Which solutions did we use previously?

Yes we used Hive, Pig, and Storm. Having everything in the same framework has helped us out a lot.

Which other solutions did I evaluate?

Yes, we considered other big data products in the Big Data Ecosystem.

What other advice do I have?

Go for it.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment
Sign Up with Email