What is most valuable?
There’s not only one, the all-stack of Hadoop is valuable, the distributed file system HDFS, Spark, Kafka, HBase, etc. Hortonworks has certainly got the most up-to-date version of each component of Hadoop.
Compared to the other Hadoop distributions, the Ambari server provides the user an easy way to manage, to administrate and to configure their cluster. Ambari also provides a single view that gives you the possibility to use different Hadoop components from the same web interface.
How has it helped my organization?
This product gives the possibility to the organization to easily and quickly install and configure a Hadoop cluster. With this cluster, the organization will be able to store and process their data and bring out some specificity on it. For example, unknown common points between their clients or key elements that will increase or decrease the churn of the client.
What needs improvement?
It would be interesting to have an easy way to implement multi-tenant for HDFS with federation. At the moment, you have to do it manually in command line.
Also, it needs to support having more than two HDFS namenodes. HDFS supports more than 2 namenodes, but Hortonworks doesn't.
For how long have I used the solution?
I work with it in different projects and POCs for two years now.
What was my experience with deployment of the solution?
The only issue that I had was when I tried to reinstall the software on every node. You have to manually clean up everything, as Hortonworks doesn’t provide the ability to perform a clean uninstall (software, library, log, configuration files, etc). In some case, it can generate some problems if the uninstall has not done correctly.
How are customer service and technical support?
I never had to create a case at the support, so I don’t know. I always find the answers to my questions on the web (forum or blog). There’s a big community that can support you.
Which solution did I use previously and why did I switch?
I also used Cloudera, MapR, and Microsoft HD Insight.
How was the initial setup?
The first time, I didn’t know anything about Big Data and Hadoop, so yes it was difficult because I did not clearly understand what I was doing.
What about the implementation team?
The implementation was at the clients datacenter. My advice is to perform a POC on premise or via a virtual machine to learn how to use it and how to tune the configuration of each Hadoop component.
When implementing it in production, firstly you need to have a clear view of the requirements you need to perform the install. For example, if you are using a local repository to install the software, it has to be updated with Hortonworks sources, especially if there are security rules (firewall access, root access limitation, etc.).
My last piece of advice is that if you have a heavy load, it is really important to implement the solution on premise, not in a virtualized environment. If you do both, you will see the difference in performance.
What's my experience with pricing, setup cost, and licensing?
The use of Hortonworks is free there’s no license but if you want there’s a support. It’s up to you to see if you need it (certainly) and to maybe negotiate it.
Which other solutions did I evaluate?
I did not really made the choice, as the client made it dependent on their experience, functionality of each distribution, privacy of the data and the licensing/support price.
What other advice do I have?
Firstly perform a POC to learn and to get an idea of the load of your future applications. Then, you should be able to correctly design the need infrastructure.