What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Apache Spark Reviews

4.3 out of 5 stars
 (13)
Hadoop report from it central station 2017 10 21 thumbnail
Find out what your peers are saying about Cloudera, IBM, Apache and others in Hadoop.
238,775 professionals have used our research on 5,955 solutions.
Hadoop report from it central station 2017 10 21 thumbnail
Find out what your peers are saying about Cloudera, IBM, Apache and others in Hadoop.
238,775 professionals have used our research on 5,955 solutions.

User Assessments By Topic About Apache Spark

Apache Spark Consultants


Request a call with one of our top consultants and experts in Apache Spark. (Add me to this list.)
Bc04932c 398a 4479 95be 4503682871d2 avatar
10
Big Data and Cloud Solution Consultant

Sign Up with Email