What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Sample customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Apache Spark Reviews

4.2 out of 5 stars
 (10)
Hadoop report from it central station 2017 04 22
Find out what your peers are saying about Cloudera, IBM, Hortonworks and others in Hadoop.
209,368 professionals have used our research on 5,561 solutions.
Hadoop report from it central station 2017 04 22
Find out what your peers are saying about Cloudera, IBM, Hortonworks and others in Hadoop.
209,368 professionals have used our research on 5,561 solutions.

User Assessments By Topic About Apache Spark

Products most compared with Apache Spark
Read reviews of Apache Spark competitors and alternatives from the IT Central Station community.
Apache Spark is compared to  % of the time.
Stats based on 15,938 user comparisons.Compare products »

Sign Up with Email