What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Sample customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Apache Spark Reviews

4.3 out of 5 stars
 (10)
Hadoop report from it central station 2017 09 16 thumbnail
Find out what your peers are saying about Cloudera, IBM, Hortonworks and others in Hadoop.
228,703 professionals have used our research on 5,836 solutions.
Hadoop report from it central station 2017 09 16 thumbnail
Find out what your peers are saying about Cloudera, IBM, Hortonworks and others in Hadoop.
228,703 professionals have used our research on 5,836 solutions.

User Assessments By Topic About Apache Spark

Apache Spark Consultants


Request a call with one of our top consultants and experts in Apache Spark. (Add me to this list.)

Sign Up with Email