Apache Spark Reviews

4.3 out of 5 stars
 (5)
Anonymous avatar x80
Consultant
Big Data Consultant at a tech services company with 501-1,000 employees
Aug 25 2017

What do you think of Apache Spark?

Valuable Features The good performance. The nice graphical management console. The long list of ML algorithms. • Improvements to My Organization We are able to solve problems, e.g., reporting on big data, that we were not able to tackle in the past. • Room for Improvement Apache Spark provides very good performance The tuning phase is still tricky. • Use of Solution I've used it for 2 years. • Deployment Issues We didn't have an issue with the deployment. • Stability Issues In the past we deployed Spark 1.3 to use Spark SQL but unfortunately one of our queries failed because of a bug fixed in following releases. Then we moved to Spark 1.6 but still some queries were failing when run against huge datasets. Now we are using version 2.1: it is more stable, it...
70d9a2a0 4c1c 45de 877e a3f35a778b96 avatar
Real User
Sr. Software Engineer at a tech vendor with 1-10 employees
Oct 01 2017

What is most valuable?

The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics. The community is growing and hence executing ML in a distributed fashion is quite good.

How has it helped my organization?

Previously we were using Hadoop MapReduce to reduce the Google Ngrams (3TB), which took us approximately five days on our cluster. After using Spark, we were able to accomplish this task within hours.

What needs improvement?

This product is already improving as the community is developing it rapidly. More ML based algorithms should be added to it, to make it algorithmic-rich for developers.
Hadoop report from it central station 2018 04 07 thumbnail
Find out what your peers are saying about Apache, Pivotal, IBM and others in Hadoop.
265,036 professionals have used our research since 2012.
Bbf02318 fe67 4cce 832f f815f2f12b20 avatar
Real User
Architect at a healthcare company with 51-200 employees
Sep 27 2017

What do you think of Apache Spark?

Valuable Features ETL and streaming capabilities. • Improvements to My Organization Made Big Data processing more convenient and a uniform framework adds to efficiency of usage since the same framework can be used for batch and stream processing. • Room for Improvement Stability in terms of API (things were difficult, when transitioning from RDD to DataFrames, then to DataSet). • Use of Solution I have used Spark since its inception in March 2015, from Spark 1.1 onwards. Currently, I use 2.2 extensively. • Stability Issues Yes, occasionally with different APIs. • Scalability Issues No. • Customer Service and Technical Support Since we were using the Open Source version of Apache Spark, without the Databricks support, we never used technical support form...
466abb35 4372 4475 8017 f3116bc72d9c avatar
Consultant
Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees
Dec 10 2017

What do you think of Apache Spark?

Improvements to My Organization Organisations can now harness richer data sets and benefit from use cases, which add value to their business functions. • Valuable Features Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware. • Room for Improvement Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing. • Use of Solution Three to five years. • Stability Issues At times when users do not know how to use Spark and request a lot of resources, then the underlying JVMs can crash, which is a big sense of worry.  • Scalability Issues No...
Bc04932c 398a 4479 95be 4503682871d2 avatar
Consultant
Big Data and Cloud Solution Consultant at a consultancy with 10,001+ employees
Oct 02 2017

What do you think of Apache Spark?

Valuable Features DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort. • Improvements to My Organization We developed a tool for data ingestion from HDFS->Raw->L1 layer with data quality checks, putting data to elastic search, performing CDC. • Room for Improvement Dynamic DataFrame options are not yet available. • Use of Solution One and a half years. • Stability Issues No. • Scalability Issues No. • Other Advice Spark gives the flexibility for developing custom applications.

Articles

User Assessments By Topic About Apache Spark

Hadoop report from it central station 2018 04 07 thumbnail
Find out what your peers are saying about Apache, Pivotal, IBM and others in Hadoop.
265,036 professionals have used our research since 2012.

Apache Spark Questions

Apache Spark Projects By Members

Apache Spark Consultants

What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Vendor 6846 screenshot 1520097506
BUYER'S GUIDE
Not sure which Hadoop solution is right for you?

Download our free Hadoop Report and find out what your peers are saying about Apache, Pivotal, IBM, and more!
Hadoop report from it central station 2018 04 07 thumbnail

Sign Up with Email