What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Sample customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Apache Spark Reviews

4.2 out of 5 stars
 (10)
Hadoop report from it central station 2017 04 22
Find out what your peers are saying about Cloudera, IBM, Hortonworks and others in Hadoop.
203,001 professionals have used our research on 5,504 solutions.
Hadoop report from it central station 2017 04 22
Find out what your peers are saying about Cloudera, IBM, Hortonworks and others in Hadoop.
203,001 professionals have used our research on 5,504 solutions.

User Assessments By Topic About Apache Spark

Apache Spark Alternatives
Read reviews of Apache Spark competitors and alternatives from the IT Central Station community.

Comparison Wheel: What's Trending

Apache Spark is compared to  % of the time.
Stats based on 14,800 user comparisons.Compare products »

Sign Up with Email