Apache Hadoop Review

Gives us high throughput and low latency for KPI visualization

What is our primary use case?

Data aggregation for KPIs. The sources of data come in all forms so the data is unstructured. We needed high storage and aggregation of data, in the background.

How has it helped my organization?

We start with data mashing on Hive and finally use this for KPI visualization. This intermediate step not only mashes data in the form that we want through data Cube slicing, but also helps us save states as snapshots for multiple time frames.

Without this, we would have had to plan another data source for only this purpose. Moving this step closer to processing worked better than keeping it at visualization. Although we can't completely avoid using data stores/snapshots at visualization, this step proved to be promising for getting data ready for better analytics and insights.

What is most valuable?

High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization.

What needs improvement?

At the beginning, MRs on Hive made me think we should get down to Hadoop MRs to have better control of the data. But later, Hive as a platform upgraded very well. I still think a Spark-type layer on top gives you an edge over having only Hive.

For how long have I used the solution?

Less than one year.

What other advice do I have?

I rate it an eight out of 10. It's huge, complex, slow. But does what it is meant for.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment

Sign Up with Email