There’s not only one, the all-stack of Hadoop is valuable, the distributed file system HDFS, Spark, Kafka, HBase, etc. Hortonworks has certainly got the most up-to-date version of each component of Hadoop.
Compared to the other Hadoop distributions, the Ambari server provides the user an easy way to manage, to administrate and to configure their cluster. Ambari also provides a single view that gives you the possibility to use different Hadoop components from the same web interface.
Improvements to My Organization
This product gives the possibility to the organization to easily and quickly install and configure a Hadoop cluster. With this cluster, the organization will be able to store and process their data and bring out some specificity on it. For example, unknown common points between their clients or key elements that will increase or decrease the churn of the client.
Room for Improvement
It would be interesting to have an easy way to implement multi-tenant for HDFS with federation. At the moment, you have to do it manually in command line.
Also, it needs to support having more than two HDFS namenodes. HDFS supports more than 2 namenodes, but Hortonworks doesn't.
Use of Solution
I work with it in different projects and POCs for two years now.
The only issue that I had was when I tried to reinstall the software on every node. You have to manually clean up everything, as Hortonworks doesn’t provide the ability to perform a clean uninstall (software, library, log, configuration files, etc). In some case, it can generate some problems if the uninstall has not done correctly.
Customer Service and Technical Support
I never had to create a case at the support, so I don’t know. I always find the answers to my questions on the web (forum or blog). There’s a big community that can support you.
I also used Cloudera, MapR, and Microsoft HD Insight.
The first time, I didn’t know anything about Big Data and Hadoop, so yes it was difficult because I did not clearly understand what I was doing.
The implementation was at the clients datacenter. My advice is to perform a POC on premise or via a virtual machine to learn how to use it and how to tune the configuration of each Hadoop component.
When implementing it in production, firstly you need to have a clear view of the requirements you need to perform the install. For example, if you are using a local repository to install the software, it has to be updated with Hortonworks sources, especially if there are security rules (firewall access, root access limitation, etc.).
My last piece of advice is that if you have a heavy load, it is really important to implement the solution on premise, not in a virtualized environment. If you do both, you will see the difference in performance.
Pricing, Setup Cost and Licensing
The use of Hortonworks is free there’s no license but if you want there’s a support. It’s up to you to see if you need it (certainly) and to maybe negotiate it.
Other Solutions Considered
I did not really made the choice, as the client made it dependent on their experience, functionality of each distribution, privacy of the data and the licensing/support price.
Firstly perform a POC to learn and to get an idea of the load of your future applications. Then, you should be able to correctly design the need infrastructure.
Disclosure: My company has a business relationship with this vendor other than being a customer: We are partners.
Nov 10 2015