Apache Spark Reviews

Filter by:Reset all filters
Filter Unavailable
Company Size
Filter Unavailable
Job Level
Filter Unavailable
Filter Unavailable
Big Data Consultant at a tech services company with 501-1,000 employees
Aug 25 2017

What do you think of Apache Spark?

Valuable Features The good performance. The nice graphical management console. The long list of ML algorithms. • Improvements to My Organization We are able to solve problems, e.g., reporting on big data, that we were not able to tackle in the past. • Room for Improvement Apache Spark provides very good performance The tuning phase is still tricky. • Use of Solution I've used it for 2 years. • Deployment Issues We didn't have an issue with the deployment. • Stability Issues In the past we deployed Spark 1.3 to use Spark SQL but unfortunately one of our queries failed because of a bug fixed in following releases. Then we moved to Spark 1.6 but still some queries were failing when run against huge datasets. Now we are using version 2.1: it is more stable, it...
Real User
Sr. Software Engineer at a tech vendor with 1-10 employees
Oct 01 2017

What is most valuable?

The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics. The community is growing and hence executing ML in a distributed fashion is quite good.

How has it helped my organization?

Previously we were using Hadoop MapReduce to reduce the Google Ngrams (3TB), which took us approximately five days on our cluster. After using Spark, we were able to accomplish this task within hours.

What needs improvement?

This product is already improving as the community is developing it rapidly. More ML based algorithms should be added to it, to make it algorithmic-rich for developers.
Find out what your peers are saying about Apache, Pivotal, IBM and others in Hadoop.
283,354 professionals have used our research since 2012.
Real User
Architect at a healthcare company with 51-200 employees
Sep 27 2017

What do you think of Apache Spark?

Valuable Features ETL and streaming capabilities. • Improvements to My Organization Made Big Data processing more convenient and a uniform framework adds to efficiency of usage since the same framework can be used for batch and stream processing. • Room for Improvement Stability in terms of API (things were difficult, when transitioning from RDD to DataFrames, then to DataSet). • Use of Solution I have used Spark since its inception in March 2015, from Spark 1.1 onwards. Currently, I use 2.2 extensively. • Stability Issues Yes, occasionally with different APIs. • Scalability Issues No. • Customer Service and Technical Support Since we were using the Open Source version of Apache Spark, without the Databricks support, we never used technical support form...
Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees
Dec 10 2017

What do you think of Apache Spark?

Improvements to My Organization Organisations can now harness richer data sets and benefit from use cases, which add value to their business functions. • Valuable Features Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware. • Room for Improvement Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing. • Use of Solution Three to five years. • Stability Issues At times when users do not know how to use Spark and request a lot of resources, then the underlying JVMs can crash, which is a big sense of worry.  • Scalability Issues No...
Jul 11 2018

What do you think of Apache Spark?

Primary Use Case Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical. • Improvements to My Organization It provides a scalable machine learning library so that we can train and predict user behavior for promotion purposes. • Valuable Features Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature. • Room for Improvement I would suggest for it to support more programming languages, and also provide an internal scheduler to schedule spark jobs with monitoring capability. • Use of Solution Trial/evaluations only.
See 1 more reviews


User Assessments By Topic About Apache Spark

Find out what your peers are saying about Apache, Pivotal, IBM and others in Hadoop.
283,354 professionals have used our research since 2012.

Apache Spark Questions

Apache Spark Projects By Members

Apache Spark Consultants

What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Not sure which Hadoop solution is right for you?

Download our free Hadoop Report and find out what your peers are saying about Apache, Pivotal, IBM, and more!

Sign Up with Email