Apache Hadoop Room for Improvement

Vice President - Finance & IT at a consumer goods company with 1-10 employees
I'm not sure if I have any ideas as to how to improve the product. Every year, the solution comes out with new features. Spark is one new feature, for example. If they could continue to release new helpful features, it will continue to increase the value of the solution. The solution could always improve performance. This is a consistent requirement. Whenever you run it, there is always room for improvement in terms of performance. The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning. We would prefer it if users didn't just get pushed through to certification-based learning, as certifications are expensive. Maybe if they could arrange it so that the certification was at a lesser cost. The certification cost is currently around $2,500 or thereabout. View full review »
IT Expert at a comms service provider with 1,001-5,000 employees
We are using HDTM circuit boards, and I worry about the future of this product and compatibility with future releases. It's a concern because, for now, we do not have a clear path to upgrade. The Hadoop product is in version three and we'd like to upgrade to the third version. But as far as I know, it's not a simple thing. There are a lot of features in this product that are open-source. If something isn't included with the distribution we are not limited. We can take things from the internet and integrate them. As far as I know, we are using Presto which isn't included in HDP (Hortonworks Data Platform) and it works fine. Not everything has to be included in the release. If something is outside of HDP and it works, that is good enough for me. We have the flexibility to incorporate it ourselves. View full review »
Data Scientist at a tech vendor with 501-1,000 employees
Hadoop itself is quite complex, especially if you want it running on a single machine, so to get it set up is a big mission. It seems that Hadoop is on it's way out and Spark is the way to go. You can run Spark on a single machine and it's easier to setup. In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency. I don't think that this is viable, but if it is possible, then latency on smaller guide queries for analysis and analytics. I would like a smaller version that can be run on a local machine. There are installations that do that but are quite difficult, so I would say a smaller version that is easy to install and explore would be an improvement. View full review »
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: January 2021.
456,495 professionals have used our research since 2012.
Founder & CTO at a tech services company with 1-10 employees
I don't have any concerns because each part of Hadoop has its use cases. To date, I haven't implemented a huge product or project using Hadoop, but on the level of POCs, it's fine. The community of Hadoop is now a cluster, I think there is room for improvement in the ecosystem. From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective. View full review »
Yevgen Manzhulyanov
What needs improvement depends on the customer and the use case. The classical Hadoop, for example, we consider an old variant. Most now work with flash data. There is a very wide application for this solution, but in enterprise companies, if you work with classical BI systems, it would be good to include an additional presentation layer for BI solutions. There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution. View full review »
Technical Lead at a government with 201-500 employees
For the visualization tools, we use Apache Hadoop and it is very slow. It lacks some query language. We have to use Apache Linux. Even so, the query language still has limitations with just a bit of documentation and many of the visualization tools do not have direct connectivity. They need something like BigQuery which is very fast. We need those to be available in the cloud and scalable. The solution needs to be powerful and offer better availability for gathering queries. The solution is very expensive. View full review »
Technical Architect at RBSG Internet Operations
We're finding vulnerabilities in running it 24/7. We're experiencing some downtime that affects the data. It would be good to have more advanced analytics tools. View full review »
Practice Lead (BI/ Data Science) at a tech services company with 11-50 employees
It could be because the solution is open source, and therefore not funded like bigger companies, but we find the solution runs slow. The solution isn't as mature as SQL or Oracle and therefore lacks many features. The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment. View full review »
We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it. View full review »
Co-Founder at a tech services company with 201-500 employees
It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake. View full review »
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: January 2021.
456,495 professionals have used our research since 2012.