Apache Spark Reviews

Filter by:
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
Consultant
Technical Consultant at a tech services company with 1-10 employees
Dec 25 2019

What is most valuable?

I have worked with Hadoop a lot in my career and you need to do a lot of things to get it to Hello World. But in Spark it is easy. You could say it's an umbrella to do everything under the one shelf. It also has Spark Streaming. I feel the… more»

What needs improvement?

I think for IT people it is good. The whole idea is that Spark works pretty easily, but a lot of people, including me, struggle to set things up properly. I like contributions and if you want to connect Spark with Hadoop its not a big… more»

What's my experience with pricing, setup cost, and licensing?

I would suggest not to try to do everything at once. Identify the area where you want to solve the problem, start small and expand it incrementally, slowly expand your vision. For example, if I have a problem where I need to do streaming… more»

Which solution did I use previously and why did I switch?

I have used MapReduce from Hadoop previously. Otherwise, I haven't used any other big data infrastructure. In my work previously, not in this company, I was working with some big data, but I was extracting using a single-core off my PC. I… more»

What other advice do I have?

On a scale of 1 to 10, I'd put it at an eight. To make it a perfect 10 I'd like to see an improved configuration bot. Sometimes it is a nightmare on Linux trying to figure out what happened on the configuration and back-end. So I think… more»
Rajendran Veerappan
Real User
Director at Nihil Solutions
Jul 29 2020

What is most valuable?

The memory processing engine is the solution's most valuable aspect. It processes everything extremely fast, and it's in the cluster itself. It acts as a memory engine and is very effective in processing data correctly.

What needs improvement?

There are lots of items coming down the pipeline in the future. I don't know what features are missing. From my point of view, everything looks good. The graphical user interface (UI) could be a bit more clear. It's very hard to figure out… more»

What's my experience with pricing, setup cost, and licensing?

I'm unsure as to how much the licensing is for the solution. It's not an aspect of the product I deal with directly.

Which solution did I use previously and why did I switch?

We did previously use a lot of different mechanisms, however, we needed something that was good at processing data for analytical purposes, and this solution fit the bill. It's a very powerful tool. I haven't seen other tools that could do… more»

What other advice do I have?

We're customers and also partners with Apache. While we are on version 2.6, we are considering upgrading to version 3.0. I'd rate the solution nine out of ten. It works very well for us and suits our purposes almost perfectly.
Find out what your peers are saying about Apache, Informatica, Cloudera and others in Hadoop. Updated: July 2020.
431,790 professionals have used our research since 2012.
Karthikeyan R
Real User
Principal Architect at a financial services firm with 1,001-5,000 employees
Jul 17 2019

What is most valuable?

The fast performance is the most valuable aspect of the solution.

How has it helped my organization?

I'm not sure how it has improved my organization but I believe that it's a good product.

What needs improvement?

The search could be improved. Usually, we are using other tools to search for specific stuff. We'll be using it how I use other tools - to get the details, but if there any way to search for little things that will be better. It needs a new… more»

Which solution did I use previously and why did I switch?

I was using some other systems and we moved to Spark later. We faced performance and other issues with the other solution.

What other advice do I have?

I would recommend the solution. I would rate it an eight or nine out of 10. For some areas, I would give it ten but I cannot use some parts. If you are going to use it for a consumer then I would be able to recommend it and you should go… more»
Consultant
Senior Consultant & Training at a tech services company with 51-200 employees
Oct 14 2019

What is most valuable?

The most valuable feature of this solution is its capacity for processing large amounts of data. This solution makes it easy to do a lot of things. It's easy to read data, process it, save it, etc.

What needs improvement?

When you first start using this solution, it is common to run into memory errors when you are dealing with large amounts of data. Once you are experienced, it is easier and more stable. When you are trying to do something outside of the normal requirements in a typical project, it is difficult to find somebody with experience.

What other advice do I have?

The work that we are doing with this solution is quite common and is very easy to do. My advice for anybody who is implementing this solution is to look at their needs and then look at the community. Normally, there are a lot of people who have already done what you need. So, even without experience, it is quite simple to do a lot of things. I would rate this solution a nine out of ten.
Real User
Co-Founder at a tech vendor with 11-50 employees
Jan 29 2020

What do you think of Apache Spark?

What is our primary use case?

We have built a product called "NetBot." We take any form of data, large email data, image,  videos or transactional data and we transform unstructured textual data videos in their structured form into reading into transactional data and we create an enterprise-wide smart data grid. That smart data grid is being used by the downstream analytics tool. We also provide machine-building for people to get faster insight into their data. 

What is most valuable?

We use all the features. We use it for end-to-end. All of our data analysis and execution happens through Spark. The features we find most valuable are the:  Machine learning Data learning Spark Analytics.

What needs improvement?

We've had problems using a Python process to try to access…
Consultant
Lead Consultant at a tech services company with 51-200 employees
Jan 30 2020

What is most valuable?

The main feature that we find valuable is that it is very fast. In terms of big data, the main feature is that the data is in so many different nodes. It goes through many data nodes so whenever we use the data, it enables us to parse the data from different data nodes.

What needs improvement?

We use big data manager but we cannot use it as conditional data so whenever we're trying to fetch the data, it takes a bit of time. There is some latency in the system and latency in the data caching. The main issue is that we need to design it in a way that data will be available to us very quickly. It takes a long time and the latest data should be available to us much quicked.

What other advice do I have?

The advice that I would give to someone considering this solution is that the quality of data has key streaming capabilities like velocity. This means how quickly you are going to refer to the data. These things matter by designing the solution. We need to take these things out. I would rate Apache Spark an eight out of ten. To make it a ten they should improve the speed. The data storage capacity… more»
Snrsecengin567
Real User
Snr Security Engineer at a tech vendor with 201-500 employees
Jul 17 2019

What do you think of Apache Spark?

What is our primary use case?

We primarily use the solution for security analytics.

What is most valuable?

The scalability has been the most valuable aspect of the solution.

What needs improvement?

The management tools could use improvement. Some of the debugging tools need some work as well. They need to be more descriptive. 

For how long have I used the solution?

I've been using the solution for three years.

What do I think about the stability of the solution?

The 2.3 version is quite stable. All of our customers use it, there are around 100,000+ users, and it runs 24/7.

What do I think about the scalability of the solution?

The scalability is very good.

How are customer service and technical support?

You actually buy Cloudera along with it. You don't really get…
Mohamed Ghorbel
Real User
Director of BigData Offer at IVIDATA
Dec 10 2019

What is most valuable?

It is a very fast solution. It's very easy to use. There are many RPis with many languages like Scala, Java, R, and Python. The greatest advantage of Spark is that we can initiate many kinds of analytics including SQL analytics, graphics analytics, etc.

What needs improvement?

The solution needs to optimize shuffling between workers.

What other advice do I have?

We use both on-premises and public and private cloud deployment models. We're partners with Databricks. I'm a consultant. Our company works for large enterprises such as banks and energy companies. 17 of our workers use Apache Spark. With the cloud, there are many companies that integrate Spark. Most projects in big data around the world use Spark, indirectly or directly. I'd rate the solution… more»
See 3 More Apache Spark Reviews

Articles

User Assessments By Topic About Apache Spark

Find out what your peers are saying about Apache, Informatica, Cloudera and others in Hadoop. Updated: July 2020.
431,790 professionals have used our research since 2012.

Apache Spark Questions

What is Apache Spark?

Spark provides programmers with an application programming interface centered on a data structure called the resilient distributed dataset (RDD), a read-only multiset of data items distributed over a cluster of machines, that is maintained in a fault-tolerant way. It was developed in response to limitations in the MapReduce cluster computing paradigm, which forces a particular linear dataflowstructure on distributed programs: MapReduce programs read input data from disk, map a function across the data, reduce the results of the map, and store reduction results on disk. Spark's RDDs function as a working set for distributed programs that offers a (deliberately) restricted form of distributed shared memory

Apache Spark customers

NASA JPL, UC Berkeley AMPLab, Amazon, eBay, Yahoo!, UC Santa Cruz, TripAdvisor, Taboola, Agile Lab, Art.com, Baidu, Alibaba Taobao, EURECOM, Hitachi Solutions

Read Archived Reviews
BUYER'S GUIDE
Download our free Hadoop Report and find out what your peers are saying about Apache, Informatica, Cloudera, and more!