Apache Spark Reviews

Filter by:Reset all filters
industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
rating
Loading...
Filter Unavailable
Consultant
Technical Consultant at a tech services company with 1-10 employees
Dec 25 2019

What is most valuable?

I have worked with Hadoop a lot in my career and you need to do a lot of things to get it to Hello World. But in Spark it is easy. You could say it's an umbrella to do everything under the one shelf. It also has Spark Streaming. I feel the… more»

What needs improvement?

I think for IT people it is good. The whole idea is that Spark works pretty easily, but a lot of people, including me, struggle to set things up properly. I like contributions and if you want to connect Spark with Hadoop its not a big… more»

What's my experience with pricing, setup cost, and licensing?

I would suggest not to try to do everything at once. Identify the area where you want to solve the problem, start small and expand it incrementally, slowly expand your vision. For example, if I have a problem where I need to do streaming… more»

Which solution did I use previously and why did I switch?

I have used MapReduce from Hadoop previously. Otherwise, I haven't used any other big data infrastructure. In my work previously, not in this company, I was working with some big data, but I was extracting using a single-core off my PC. I… more»

What other advice do I have?

On a scale of 1 to 10, I'd put it at an eight. To make it a perfect 10 I'd like to see an improved configuration bot. Sometimes it is a nightmare on Linux trying to figure out what happened on the configuration and back-end. So I think… more»
Karthikeyan R
Real User
Principal Architect at a financial services firm with 1,001-5,000 employees
Jul 17 2019

What is most valuable?

The fast performance is the most valuable aspect of the solution.

How has it helped my organization?

I'm not sure how it has improved my organization but I believe that it's a good product.

What needs improvement?

The search could be improved. Usually, we are using other tools to search for specific stuff. We'll be using it how I use other tools - to get the details, but if there any way to search for little things that will be better. It needs a new… more»

Which solution did I use previously and why did I switch?

I was using some other systems and we moved to Spark later. We faced performance and other issues with the other solution.

What other advice do I have?

I would recommend the solution. I would rate it an eight or nine out of 10. For some areas, I would give it ten but I cannot use some parts. If you are going to use it for a consumer then I would be able to recommend it and you should go… more»
Find out what your peers are saying about Apache, Informatica, Pivotal and others in Hadoop. Updated: January 2020.
389,772 professionals have used our research since 2012.
Consultant
Senior Consultant & Training at a tech services company with 51-200 employees
Oct 14 2019

What is most valuable?

The most valuable feature of this solution is its capacity for processing large amounts of data. This solution makes it easy to do a lot of things. It's easy to read data, process it, save it, etc.

What needs improvement?

When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data. Once you are experienced, it is easier and more stable. When you are trying to do something outside of the normal requirements in a typical project, it is difficult to find somebody with experience.

What other advice do I have?

The work that we are doing with this solution is quite common and is very easy to do. My advice for anybody who is implementing this solution is to look at their needs and then look at the community. Normally, there are a lot of people who have already done what you need. So, even without experience, it is quite simple to do a lot of things. I would rate this solution a nine out of ten.
Snrsecengin567
Real User
Snr Security Engineer at a tech vendor with 201-500 employees
Jul 17 2019

What do you think of Apache Spark?

What is our primary use case?

We primarily use the solution for security analytics.

What is most valuable?

The scalability has been the most valuable aspect of the solution.

What needs improvement?

The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive. 

For how long have I used the solution?

I've been using the solution for three years.

What do I think about the stability of the solution?

The 2.3 version is quite stable. All of our customers use it, there are around 100,000+ users, and it runs 24/7.

What do I think about the scalability of the solution?

The scalability is very good.

How are customer service and technical support?

You actually buy Cloudera along with it. You don't really get…
Mohamed Ghorbel
Real User
Director of BigData Offer at IVIDATA
Dec 10 2019

What is most valuable?

It is a very fast solution. It's very easy to use. There are many RPis with many languages like Scala, Java, R, and Python. The greatest advantage of Spark is that we can initiate many kinds of analytics including SQL analytics, graphics analytics, etc.

What needs improvement?

The solution needs to optimize shuffling between workers.

What other advice do I have?

We use both on-premises and public and private cloud deployment models. We're partners with Databricks. I'm a consultant. Our company works for large enterprises such as banks and energy companies. 17 of our workers use Apache Spark. With the cloud, there are many companies that integrate Spark. Most projects in big data around the world use Spark, indirectly or directly. I'd rate the solution… more»
User
User
Jul 11 2018

What do you think of Apache Spark?

What is our primary use case?

Used for building big data platforms for processing huge volumes of data. Additionally, streaming data is critical.

How has it helped my organization?

It provides a scalable machine learning library so that we can train and predict user behavior for promotion purposes.

What is most valuable?

Machine learning, real time streaming, and data processing are fantastic, as well as the resilient or fault tolerant feature.

What needs improvement?

I would suggest for it to support more programming languages, and also provide an internal scheduler to schedule spark jobs with monitoring capability.

For how long have I used the solution?

Trial/evaluations only.
Sumanth Punyamurthula
Real User
Director - Data Management, Governance and Quality with 10,001+ employees
Mar 19 2019

What do you think of Apache Spark?

What is our primary use case?

Ingesting billions of rows of data all day.

How has it helped my organization?

Spark on AWS is not that cost-effective as memory is expensive and you cannot customize hardware in AWS. If you want more memory, you have to pay for more CPUs too in AWS.

What is most valuable?

Powerful language.

What needs improvement?

It is like going back to the '80s for the complicated coding that is required to write efficient programs.
Rosemary Walsh
Real User
Portfolio Manager, Enterprise Solutions Architect with 10,001+ employees
Apr 11 2019

What do you think of Apache Spark?

What is our primary use case?

Streaming telematics data.

How has it helped my organization?

It's a better MR, supports streaming and micro-batch, and supports Spark ML and Spark SQL.

What is most valuable?

It supports streaming and micro-batch.

What needs improvement?

Better data lineage support.

Articles

User Assessments By Topic About Apache Spark

Find out what your peers are saying about Apache, Informatica, Pivotal and others in Hadoop. Updated: January 2020.
389,772 professionals have used our research since 2012.

Apache Spark Questions

What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Read Archived Reviews
BUYER'S GUIDE
Download our free Hadoop Report and find out what your peers are saying about Apache, Informatica, Pivotal, and more!