Apache Hadoop Overview

Apache Hadoop is the #3 ranked solution in our list of top Data Warehouse tools. It is most often compared to Snowflake: Apache Hadoop vs Snowflake

What is Apache Hadoop?
The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.
Apache Hadoop Buyer's Guide

Download the Apache Hadoop Buyer's Guide including reviews and more. Updated: March 2021

Apache Hadoop Customers
Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
Apache Hadoop Video

Pricing Advice

What users are saying about Apache Hadoop pricing:
  • "This is a low cost and powerful solution."

Apache Hadoop Reviews

Filter by:
Filter Reviews
Industry
Loading...
Filter Unavailable
Company Size
Loading...
Filter Unavailable
Job Level
Loading...
Filter Unavailable
Rating
Loading...
Filter Unavailable
Considered
Loading...
Filter Unavailable
Order by:
Loading...
  • Date
  • Highest Rating
  • Lowest Rating
  • Review Length
Search:
Showingreviews based on the current filters. Reset all filters
reviewer1384338
Vice President - Finance & IT at a consumer goods company with 1-10 employees
Real User
Jul 15, 2020
Great micro-partitions, helpful technical support and quite stable

What is our primary use case?

As an example of a use case, when I was a contractor for Cisco, we were processing mobile network data and the volume was too big. RDBMS was not supporting anything. We started using the Hadoop framework to improve the process and get the results faster.

Pros and Cons

  • "The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so."
  • "The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."

What other advice do I have?

We're just a customer. We don't have a business relationship with Hadoop. My day-to-day job is data modeling and architecting. Originally we used it as an open-source solution. We downloaded it, then we went for a commercial version of it. In terms of advice, I'd tell other potential users that whether the solution is right for them depends on a few items. If the data volume is too big, it's IoT data, or the stream of data is too much, this solution can handle it and I would definitely recommend Apache Hadoop. Recently, in the last 18 months, I've been working with the Snowflake, it's a Data…
ITexp677
IT Expert at a comms service provider with 1,001-5,000 employees
Real User
Jul 29, 2019
An inexpensive and flexible suite that helps users integrate varied legacy systems

What is our primary use case?

We primarily use this product to integrate legacy systems.

Pros and Cons

  • "The best thing about this solution is that it is very powerful and very cheap."
  • "The upgrade path should be improved because it is not as easy as it should be."

What other advice do I have?

I would give this product a rating of eight out of ten. It would not be a ten out of ten because of some problems we are having with the upgrade to the newer version. It would have been better for us if these problems were not holding us back. I think eight is good enough.
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: March 2021.
465,836 professionals have used our research since 2012.
reviewer860583
Data Scientist at a tech vendor with 501-1,000 employees
Real User
Top 5Leaderboard
Sep 30, 2019
Good standard features, but a small local-machine version would be useful

What is our primary use case?

The primary use case of this solution is data engineering and data files. The deployment model we are using is private, on-premises.

Pros and Cons

  • "What comes with the standard setup is what we mostly use, but Ambari is the most important."
  • "In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency."

What other advice do I have?

It's good for what is meant to do, a lot of big data, but it's not as good for low latency applications. If you have to perform quick queries on naive or analytics it can be frustrating. It can be useful for what it was intended to be used for. I would rate this solution a seven out of ten.
reviewer1464630
Founder & CTO at a tech services company with 1-10 employees
Real User
Dec 15, 2020
Processes large data sets across clusters of computers

What is our primary use case?

We mainly use Apache Hadoop for real-time streaming. Real-time streaming and integration using Spark streaming and the ecosystem of Spark technologies inside Hadoop.

Pros and Cons

  • "Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."
  • "From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."

What other advice do I have?

Usually, people need to study and prepare for a few use cases and compare multiple ecosystems before choosing one. When people think of using a big data solution, Hadoop comes to mind. For certain use cases, Hadoop is comparable with other technologies. For example, when building a sort of real-time data warehouse — an enterprise data hub —, people don't think about using Hadoop directly. People often use solutions like DROID for building. At the end of the day, you need to compare technologies — existing technologies against their use cases. You need to study your use case and select the…
Yevgen Manzhulyanov
CEO at AM-BITS LLC
Real User
Nov 27, 2019
Good stability and scalability but the visualization isn't good

What is our primary use case?

We primarily use the solution for the enterprise data hub and big data warehouse extension.

Pros and Cons

  • "The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
  • "There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution."

What other advice do I have?

We use the on-premises deployment model. It's a requirement for the company we work with, which is a bank. Often customers demand we work with on-premises deployment models. I'd rate the solution seven out of ten. In terms of the ability to build middleware and offer scalability, it would be 10 out of 10 from me. However, if you take into account only the visualization, I'd only rate it at three or four out of ten.
reviewer1433400
Technical Lead at a government with 201-500 employees
Real User
Oct 20, 2020
Good distributed processing and performance, but very expensive

Pros and Cons

  • "The performance is pretty good."
  • "The solution is very expensive."

What other advice do I have?

The solution is perfect for those dealing with a huge amount of data. Still, you need to check to make sure it meets your company's requirements. You need to understand them before actually choosing the technology you'll ultimately use. Overall, I would rate the solution at a seven out of ten.
YogeshThakkar
Technical Architect at RBSG Internet Operations
Real User
Dec 17, 2019
Good database and highly scalable, with good plug and play analytics tools

What is our primary use case?

We are primarily dumping all the prior payment transaction data into a loop system and then we use some of the plug and play analytics tools to translate it.

Pros and Cons

  • "The most valuable feature is the database."
  • "It would be good to have more advanced analytics tools."

What other advice do I have?

We use the on-premises deployment model. We're more inclined towards an operational data source to fill our customer's needs. Hadoop is good for analytics and some reporting requirements. It's a good solution for those needing something for the purposes of management reporting. I'd rate the solution eight out of ten.
MukundMishra
Practice Lead (BI/ Data Science) at a tech services company with 11-50 employees
Real User
Top 20
Dec 16, 2019
Good for managing and replication of big data but needs a better user interface

Pros and Cons

  • "It's good for storing historical data and handling analytics on a huge amount of data."
  • "The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment."

What other advice do I have?

I've used the solution under cloud, hybrid and on-premises deployment models. I'd recommend the solution, but it depends on the company's requirements. If you don't have huge amounts of data, you probably don't need Hadoop. If you need a completely private environment, and you have lots of big data, consider Hadoop. You don't even need to invest in the infrastructure as you can just use a cloud deployment. I'd rate the solution seven out of ten. I'd rate it higher if it had a better user interface.
See 2 more Apache Hadoop Reviews