Apache Spark Reviews

Filter by:Reset all filters
industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
rating
Loading...
Filter Unavailable
Karthikeyan R
Real User
Principal Architect at a financial services firm with 1,001-5,000 employees
Jul 17 2019

What is most valuable?

The fast performance is the most valuable aspect of the solution.

How has it helped my organization?

I'm not sure how it has improved my organization but I believe that it's a good product.

What needs improvement?

The search could be improved. Usually, we are using other tools to search for specific stuff. We'll be using it how I use other tools - to get the details, but if there any way to search for little things that will be better. It needs a new… more»

If you previously used a different solution, which one did you use and why did you switch?

I was using some other systems and we moved to Spark later. We faced performance and other issues with the other solution.

What other advice do I have?

I would recommend the solution. I would rate it an eight or nine out of 10. For some areas, I would give it ten but I cannot use some parts. If you are going to use it for a consumer then I would be able to recommend it and you should go… more»
Consultant
Senior Consultant & Training at a tech services company with 51-200 employees
Oct 14 2019

What is most valuable?

The most valuable feature of this solution is its capacity for processing large amounts of data. This solution makes it easy to do a lot of things. It's easy to read data, process it, save it, etc.

What needs improvement?

When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data. Once you are experienced, it is easier and more stable. When you are trying to do something outside of the normal requirements in a typical project, it is difficult to find somebody with experience.

What other advice do I have?

The work that we are doing with this solution is quite common and is very easy to do. My advice for anybody who is implementing this solution is to look at their needs and then look at the community. Normally, there are a lot of people who have already done what you need. So, even without experience, it is quite simple to do a lot of things. I would rate this solution a nine out of ten.
Find out what your peers are saying about Apache, Pivotal, Informatica and others in Hadoop. Updated: September 2019.
371,639 professionals have used our research since 2012.
Snrsecengin567
Real User
Snr Security Engineer at a tech vendor with 201-500 employees
Jul 17 2019

What do you think of Apache Spark?

What is our primary use case?

We primarily use the solution for security analytics.

What is most valuable?

The scalability has been the most valuable aspect of the solution.

What needs improvement?

The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive. 

For how long have I used the solution?

I've been using the solution for three years.

What do I think about the stability of the solution?

The 2.3 version is quite stable. All of our customers use it, there are around 100,000+ users, and it runs 24/7.

What do I think about the scalability of the solution?

The scalability is very good.

How are customer service and technical support?

You actually buy Cloudera along with it. You don't really get…
Abhijit Nayak
Consultant
Manager | Data Science Enthusiast | Management Consultant at a consultancy with 5,001-10,000 employees
Dec 10 2017

What do you think of Apache Spark?

How has it helped my organization?

Organisations can now harness richer data sets and benefit from use cases, which add value to their business functions.

What is most valuable?

Distributed in memory processing. Some of the algorithms are resource heavy and executing this requires a lot of RAM and CPU. With Hadoop-related technologies, we can distribute the workload with multiple commodity hardware.

What needs improvement?

Include more machine learning algorithms and the ability to handle streaming of data versus micro batch processing.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

At times when users do not know how to use Spark and request a lot of resources, then the underlying JVMs can crash, which is a…
User
User
Jul 11 2018

What do you think of Apache Spark?

What is our primary use case?

Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical.

How has it helped my organization?

It provides a scalable machine learning library so that we can train and predict user behavior for promotion purposes.

What is most valuable?

Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature.

What needs improvement?

I would suggest for it to support more programming languages, and also provide an internal scheduler to schedule spark jobs with monitoring capability.

For how long have I used the solution?

Trial/evaluations only.
Sumanth Punyamurthula
Real User
Director - Data Management, Governance and Quality with 10,001+ employees
Mar 19 2019

What do you think of Apache Spark?

What is our primary use case?

Ingesting billions of rows of data all day.

How has it helped my organization?

Spark on AWS is not that cost-effective as memory is expensive and you cannot customize hardware in AWS. If you want more memory, you have to pay for more CPUs too in AWS.

What is most valuable?

Powerful language.

What needs improvement?

It is like going back to the '80s for the complicated coding that is required to write efficient programs.
Rosemary Walsh
Real User
Portfolio Manager, Enterprise Solutions Architect with 10,001+ employees
Apr 11 2019

What do you think of Apache Spark?

What is our primary use case?

Streaming telematics data.

How has it helped my organization?

It's a better MR, supports streaming and micro-batch, and supports Spark ML and Spark SQL.

What is most valuable?

It supports streaming and micro-batch.

What needs improvement?

Better data lineage support.

Articles

User Assessments By Topic About Apache Spark

Find out what your peers are saying about Apache, Pivotal, Informatica and others in Hadoop. Updated: September 2019.
371,639 professionals have used our research since 2012.

Apache Spark Questions

What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Read Archived Reviews
BUYER'S GUIDE
Download our free Hadoop Report and find out what your peers are saying about Apache, Pivotal, Informatica, and more!
Sign Up with Email