Apache Spark Archived Reviews (More than two years old)

Filter by:Reset all filters
Filter Unavailable
Company Size
Filter Unavailable
Job Level
Filter Unavailable
Filter Unavailable
Filter Unavailable
Order by:
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Big Data Consultant at a tech services company with 501-1,000 employees
Aug 25 2017

What do you think of Apache Spark?

What is most valuable?

The good performance. The nice graphical management console. The long list of ML algorithms.

How has it helped my organization?

We are able to solve problems, e.g., reporting on big data, that we were not able to tackle in the past.

What needs improvement?

Apache Spark provides very good performance The tuning phase is still tricky.

For how long have I used the solution?

I've used it for 2 years.

What was my experience with deployment of the solution?

We didn't have an issue with the deployment.

What do I think about the stability of the solution?

In the past we deployed Spark 1.3 to use Spark SQL but unfortunately one of our queries failed because of a bug fixed in following releases. Then we moved to Spark 1.6 but still some queries were…
Chief System Architect at a marketing services firm with 501-1,000 employees
Mar 30 2016

What is most valuable?

With spark SQL we've now the capabilities to analyse very large quantities of data located in S3 on Amazon at very low cost comparing other solution we checked. We also use our own Spark cluster to aggregate data on near real time and save… more»

How has it helped my organization?

Until Spark we didn't have the ability to analyse this quantity of data we're talking about two TB/hour. So we're now able to produce a lot of reports, and are also able to develop machine learning based analysis to optimize our business… more»

What needs improvement?

Spark is actually very good for batch analysis much more good than Hadoop, it's much simple, much more quicker etc., but it actually lacks the ability to perform real-time querying like Vertica or Redshift. Also, it is more difficult for an… more»

Which solution did I use previously and why did I switch?

Yes to make this job we've used a MySQL database. We switch because MySQL is not a scalable solution and we've reach it's limits.

Which other solutions did I evaluate?

Yes we've started to evaluate analytics databases : vertica, exasol, and other for all the them the price was an issue regarding the quantity of data we want to manipulate.

What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Download our free Hadoop Report and find out what your peers are saying about Apache, Informatica, VMware, and more!