Apache Spark Reviews

Filter by:Reset all filters
industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
rating
Loading...
Filter Unavailable
BigDataConsult393
Consultant
Big Data Consultant at a tech services company with 501-1,000 employees
Aug 25 2017

What do you think of Apache Spark?

What is most valuable?

The good performance. The nice graphical management console. The long list of ML algorithms.

How has it helped my organization?

We are able to solve problems, e.g., reporting on big data, that we were not able to tackle in the past.

What needs improvement?

Apache Spark provides very good performance The tuning phase is still tricky.

For how long have I used the solution?

I've used it for 2 years.

What was my experience with deployment of the solution?

We didn't have an issue with the deployment.

What do I think about the stability of the solution?

In the past we deployed Spark 1.3 to use Spark SQL but unfortunately one of our queries failed because of a bug fixed in following releases. Then we moved to Spark 1.6 but still some queries were…
ShivanshSrivastava
Real User
Sr. Software Engineer at a tech vendor with 1-10 employees
Oct 01 2017

What is most valuable?

The most valuable feature is the Fault Tolerance and easy binding with other processes like Machine Learning, graph analytics. The community is growing and hence executing ML in a… more»

How has it helped my organization?

Previously we were using Hadoop MapReduce to reduce the Google Ngrams (3TB), which took us approximately five days on our cluster. After using Spark, we were able to accomplish this… more»

What needs improvement?

This product is already improving as the community is developing it rapidly. More ML based algorithms should be added to it, to make it algorithmic-rich for developers.

What other advice do I have?

This is a very good product for the big data analytics and integrates well with other parts like Machine Learning and graph analytics.
Find out what your peers are saying about Apache Spark vs. Hortonworks Data Platform and other solutions. Updated: July 2019.
352,246 professionals have used our research since 2012.
Sumit Pal
Real User
Architect at a healthcare company with 51-200 employees
Sep 27 2017

What do you think of Apache Spark?

What is most valuable?

ETL and streaming capabilities.

How has it helped my organization?

Made Big Data processing more convenient and a uniform framework adds to efficiency of usage since the same framework can be used for batch and stream processing.

What needs improvement?

Stability in terms of API (things were difficult, when transitioning from RDD to DataFrames, then to DataSet).

For how long have I used the solution?

I have used Spark since its inception in March 2015, from Spark 1.1 onwards. Currently, I use 2.2 extensively.

What do I think about the stability of the solution?

Yes, occasionally with different APIs.

What do I think about the scalability of the solution?

No.

How is customer service and technical support?

Since we were using the Open Source…
Abhijit Nayak
Consultant
Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees
Dec 10 2017

What do you think of Apache Spark?

How has it helped my organization?

Organisations can now harness richer data sets and benefit from use cases, which add value to their business functions.

What is most valuable?

Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.

What needs improvement?

Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

At times when users do not know how to use Spark and request a lot of resources, then the underlying JVMs can crash, which is a…
User
User
Jul 11 2018

What do you think of Apache Spark?

What is our primary use case?

Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical.

How has it helped my organization?

It provides a scalable machine learning library so that we can train and predict user behavior for promotion purposes.

What is most valuable?

Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature.

What needs improvement?

I would suggest for it to support more programming languages, and also provide an internal scheduler to schedule spark jobs with monitoring capability.

For how long have I used the solution?

Trial/evaluations only.
Subhasish Guha
Consultant
Big Data and Cloud Solution Consultant at a financial services firm with 10,001+ employees
Oct 02 2017

What do you think of Apache Spark?

What is most valuable?

DataFrame: Spark SQL gives the leverage to create applications more easily and with less coding effort.

How has it helped my organization?

We developed a tool for data ingestion from HDFS->Raw->L1 layer with data quality checks, putting data to elastic search, performing CDC.

What needs improvement?

Dynamic DataFrame options are not yet available.

For how long have I used the solution?

One and a half years.

What do I think about the stability of the solution?

No.

What do I think about the scalability of the solution?

No.

What other advice do I have?

Spark gives the flexibility for developing custom applications.
Sumanth Punyamurthula
Real User
Director - Data Management, Governance and Quality with 10,001+ employees
Mar 19 2019

What do you think of Apache Spark?

What is our primary use case?

Ingesting billions of rows of data all day.

How has it helped my organization?

Spark on AWS is not that cost-effective as memory is expensive and you cannot customize hardware in AWS. If you want more memory, you have to pay for more CPUs too in AWS.

What is most valuable?

Powerful language.

What needs improvement?

It is like going back to the '80s for the complicated coding that is required to write efficient programs.
Rosemary Walsh
Real User
Portfolio Manager, Enterprise Solutions Architect with 10,001+ employees
Apr 11 2019

What do you think of Apache Spark?

What is our primary use case?

Streaming telematics data.

How has it helped my organization?

It's a better MR, supports streaming and micro-batch, and supports Spark ML and Spark SQL.

What is most valuable?

It supports streaming and micro-batch.

What needs improvement?

Better data lineage support.

Articles

User Assessments By Topic About Apache Spark

Find out what your peers are saying about Apache Spark vs. Hortonworks Data Platform and other solutions. Updated: July 2019.
352,246 professionals have used our research since 2012.

Apache Spark Questions

Apache Spark Projects By Members

What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Sign Up with Email