We just raised a $30M Series A: Read our story

Cloudera Distribution for Hadoop Competitors and Alternatives

Get our free report covering Hewlett Packard Enterprise, Amazon, Apache, and other competitors of Cloudera Distribution for Hadoop. Updated: October 2021.
542,608 professionals have used our research since 2012.

Read reviews of Cloudera Distribution for Hadoop competitors and alternatives

MR
Manager of Process & Systems / Solutions Architect / BI Developer at HENKEL FRANCE
Real User
Top 5
Stable with good connectivity and good integration capabilities

Pros and Cons

  • "Anyone who has even a little bit of knowledge of the solution can begin to create things. You don't have to be technical to use the solution."
  • "There needs to be some simplification of the user interface."

What is our primary use case?

We primarily use the solution for data cleansing, data mining and report architecture.

What is most valuable?

I like the connectivity of the solution. We are able to put in large sets of data and are able to share that with a small query. We can then offer others access. 

The solution allows you to create your own cube, particularly in a shared environment, and that is very useful and extremely powerful. 

The ability to then link to a Power BI or Power Query where you have nice visualization tools is quite useful. The solution works seamlessly with them as a package. It's fantastic.

Anyone who has even a little bit of knowledge of the solution can begin to create things. You don't have to be technical to use the solution.

What needs improvement?

Data cleansing is not intuitive and user-friendly. When things have errors, you have to hunt them down as opposed to the solution simply showing you intuitively where to find it. I would recommend that they look at that Tableau Prep tool and see how it is pieced together. That's a great data cleansing tool. If Microsoft has something like that, then we wouldn't even have to look at some of the other options.

There needs to be some simplification of the user interface. Right now it's too complicated.

There isn't a way to put controls on the solution, so anyone can use any part of it, and sometimes novices will go and try to create things, but not know enough about what is official and what is published. It would be ideal if we could segment off certain sections so that not everyone had access to the whole solution.

I'd like to see something more of a mapping tool so that you could see how the reports are connected, similar to Tableau Prep and Naim. That would make for a pretty useful diagnostics check. People would be better able to understand the linkage between your datasets. 

It would be nice if the solution offered some templates. It would make it even more plug and play, and give people a good jumping-off point. After that, they could explore other bells and whistles as they get further into understanding the solution.

The solution should work in some virtualization. It would be a good added feature.

If this product had those things then I wouldn't need to use other products.

For how long have I used the solution?

I've been using the solution for about a year.

What do I think about the stability of the solution?

Microsoft is pretty good and it's well embedded. The only thing is that causes issues is when they come out with enhancements. They like to teach you after the fact where they put the buttons. Other than that, it's pretty stable. They've got their bugs out before they release and then continue to improve on it. It's relatively painless.

What do I think about the scalability of the solution?

The solution is extremely scalable. My organization is moving away from the other solutions because of the cost and the complexity to implement and train, and then have a knowledgeable workforce. They find that this with Power BI is one where it's a smaller learning curve and has a better price point, and, of course, you can scale it. You can give it to as many users as you like without having to burn your wallet and without having to retrain an entire staff of analysts or admin types.

We have about 2,500 people in the US using the solution, and maybe 50,000 globally.

We'll be making this an official platform and increasing usage in the future.

How are customer service and technical support?

Technical support is not that great. It's more like a study session than support. Hopefully, they can work on making it better in the future

Which solution did I use previously and why did I switch?

We had a combination of Tableau with Naim as the back-end to clean the data and then we also used Cloudera to transact between systems. This setup required a lot of human resources and knowhow and even with all that, it was mostly functional, but when it wasn't, then that was a big deal. We've since moved to this solution which I believe is a simpler platform. However, streamlining can be a double-edged sword. If it's managed well I believe using this solution will be an improvement to the overall efficiencies.

How was the initial setup?

The initial setup is totally straightforward. It's already built into your normal Office Suite. All you have to do is know where to find it and then, obviously, you have to know how to use it. 

However, as far as installation goes, the ease of getting into it is great. It's so readily available and there's information online to educate yourself on the product. It's a great product in that sense. It's an open toolbox for anyone who has the willingness to understand the functionality.

What's my experience with pricing, setup cost, and licensing?

I'm not familiar with the cost of licensing, but I know that their basic model is probably free, and it covers 90% of any novice's needs. The license packages have, from my experience, just a few more bells and whistles, particularly when it comes to publications. The pricing is very fair and super cheap.

What other advice do I have?

We are using the standard 34-bit version of the solution that comes with most standard Office and Excel packages.

The solution offers the bare basics but it's everything that you would need to know with regard to data analytics and key reporting techniques. If you can manage Power BI, and other tools, you'll have more enhanced capabilities, but the same logic applies. It's a great tool for newer users who are trying to get involved with data analysis.

I stayed away from it originally because it looked very complicated and I had a bad experience with Access, and I thought it was just Microsoft trying to push Access with a different face. However, they really did come through and rethought everything and once you get into exploring the solution, you realize that they have really simplified it so that it's easy to get into.

I'd give the solution a solid eight out of ten. It's easy to use, and it offers good pricing and very good flexibility. It's just falling out on some of the extra fancy things that the other more expensive solutions have. However, once you get into paid versions, there may be more features.

Which deployment model are you using for this solution?

Hybrid Cloud
Disclosure: I am a real user, and this review is based on my own experience and opinions.
SA
Technical Consultant at a tech services company with 1-10 employees
Consultant
Top 20
Good Streaming features enable to enter data and analysis within Spark Stream

Pros and Cons

  • "I feel the streaming is its best feature."
  • "When you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources."

What is our primary use case?

We are working with a client that has a wide variety of data residing in other structured databases, as well. The idea is to make a database in Hadoop first, which we are in the process of building right now. One place for all kinds of data. Then we are going to use Spark.

What is most valuable?

I have worked with Hadoop a lot in my career and you need to do a lot of things to get it to Hello World. But in Spark it is easy. You could say it's an umbrella to do everything under the one shelf. It also has Spark Streaming. I feel the streaming is its best feature because I have extracted to enter data and analysis within Spark Stream.

What needs improvement?

I think for IT people it is good. The whole idea is that Spark works pretty easily, but a lot of people, including me, struggle to set things up properly. I like contributions and if you want to connect Spark with Hadoop its not a big thing, but other things, such as if you want to use Sqoop with Spark, you need to do the configuration by hand. I wish there would be a solution that does all these configurations like in Windows where you have the whole solution and it does the back-end. So I think that kind of solution would help. But still, it can do everything for a data scientist.

Spark's main objective is to manipulate and calculate. It is playing with the data. So it has to keep doing what it does best and let the visualization tool do what it does best.

Overall, it offers everything that I can imagine right now. 

For how long have I used the solution?

I have been using Apache Spark for a couple of months.

What do I think about the stability of the solution?

In terms of stability, I have not seen any bugs, glitches or crashes. Even if there is, that's fine, because I would probably take care of it and then I'd have progressed further in the process.

What do I think about the scalability of the solution?

I have not tested the scalability yet.

In my company, there are two or three people that are using it for different products. But right now, the client I'm engaged with doesn't know anything about Spark or Hadoop. They are a typical financial company so they do what they do, and they ask us to do everything. They have pretty much outsourced their whole big data initiative to us.

Which solution did I use previously and why did I switch?

I have used MapReduce from Hadoop previously. Otherwise, I haven't used any other big data infrastructure.

In my work previously, not in this company, I was working with some big data, but I was extracting using a single-core off my PC. I realized over time that my system had eight cores. So instead, I used all of those cores for multi-core programming. Then I realized that Hadoop and Spark do the same thing but with different PC's. That was then I used multi-core programming and that's the point - Spark needs to go and search Hadoop and other things.

How was the initial setup?

The initial setup to get it to Hello World is pretty easy, you just have to install it. But when you want to extract data from your HDFS and other sources then it is kind of tricky because you have to connect with those sources. But you can get a lot of help from different sources on the internet. So it's great. A lot of people are doing it.

I work with a startup company. You know that in startups you do not have the luxury of different people doing different things, you have to do everything on your own, and it's an opportunity to learn everything. In a typical corporate or big organization you only have restricted SOPs, you have to work within the boundaries. In my organization, I have to set up all the things, configure it, and work on it myself.

What's my experience with pricing, setup cost, and licensing?

I would suggest not to try to do everything at once. Identify the area where you want to solve the problem, start small and expand it incrementally, slowly expand your vision. For example, if I have a problem where I need to do streaming, just focus on the streaming and not on the machine learning that Spark offers. It offers a lot of things but you need to focus on one thing so that you can learn. That is what I have learned from the little experience I have with Spark. You need to focus on your objective and let the tools help you rather than the tools drive the work. That is my advice.

What other advice do I have?

On a scale of 1 to 10, I'd put it at an eight.

To make it a perfect 10 I'd like to see an improved configuration bot. Sometimes it is a nightmare on Linux trying to figure out what happened on the configuration and back-end. So I think installation and configuration with some other tools. We are technical people, we could figure it out, but if aspects like that were improved then other people who are less technical would use it and it would be more adaptable to the end-user.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Get our free report covering Hewlett Packard Enterprise, Amazon, Apache, and other competitors of Cloudera Distribution for Hadoop. Updated: October 2021.
542,608 professionals have used our research since 2012.