Apache Hadoop Review

Good standard features, but a small local-machine version would be useful

What is our primary use case?

The primary use case of this solution is data engineering and data files.

The deployment model we are using is private, on-premises.

What is most valuable?

We don't use many of the Hadoop features, like Pig, or Sqoop, but what I like most is using the Ambari feature. You have to use Ambari otherwise it is very difficult to configure.

What comes with the standard setup is what we mostly use, but Ambari is the most important.

What needs improvement?

Hadoop itself is quite complex, especially if you want it running on a single machine, so to get it set up is a big mission.

It seems that Hadoop is on it's way out and Spark is the way to go. You can run Spark on a single machine and it's easier to setup.

In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency. I don't think that this is viable, but if it is possible, then latency on smaller guide queries for analysis and analytics.

I would like a smaller version that can be run on a local machine. There are installations that do that but are quite difficult, so I would say a smaller version that is easy to install and explore would be an improvement.

For how long have I used the solution?

I have been using this solution for one year.

What do I think about the stability of the solution?

This solution is stable but sometimes starting up can be quite a mission. With a full proper setup, it's fine, but it's a lot of work to look after, and to startup and shutdown.

What do I think about the scalability of the solution?

This solution is scalable, and I can scale it almost indefinitely.

We have approximately two thousand users, half of the users are using it directly and another thousand using the products and systems running on it. Fifty are data engineers, fifteen direct appliances, and the rest are business users.

How are customer service and technical support?

There are several forums on the web, and Google search works fine. There is a lot of information available and it often works.

They also have good support in regards to the implementation.

I am satisfied with the support. Generally, there is good support.

Which solution did I use previously and why did I switch?

We used the more traditional database solutions such as SAP IQ  and Data Marks, but now it's changing more towards Data Science and Big Data.

We are a smaller infrastructure, so that's how we are set up.

How was the initial setup?

The initial setup is quite complex if you have to set it up yourself. Ambari makes it much easier, but on the cloud or local machines, it's quite a process.

It took at least a day to set it up.

What about the implementation team?

I did not use a vendor. I implemented it myself on the cloud with my local machine.

Which other solutions did I evaluate?

There was an evaluation, but it was a decision to implement with Data Lake and Hortonworks data platform.

What other advice do I have?

It's good for what is meant to do, a lot of big data, but it's not as good for low latency applications.

If you have to perform quick queries on naive or analytics it can be frustrating.

It can be useful for what it was intended to be used for.

I would rate this solution a seven out of ten.

**Disclosure: I am a real user, and this review is based on my own experience and opinions.
More Apache Hadoop reviews from users
...who compared it with Oracle Exadata
Add a Comment