Apache Hadoop Room for Improvement
The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with data skewness do not make any sense. We usually have to deal with it using a custom solution.
Spark would deal with such cases efficiently. If Hadoop solves the issues the way Spark does, it can compete with Spark at the same level. Hive is a little slower than Spark. Spark is in-memory and parallel processing. Hive is not in-memory, but it is parallel processing.
Hadoop isn't so problematic. It deals with file storage and maintenance. It is a network of file operations.
The stability of the solution needs improvement.
View full review »What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.
View full review »Buyer's Guide
Apache Hadoop
April 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,847 professionals have used our research since 2012.
GM
reviewer2324613
Data Architect at a computer software company with 51-200 employees
The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support.
And then there's the server issue. You have to create and maintain servers on your own, which can be hectic. Sometimes, the configurations in the documentation don't work, and without a strong community to turn to, you can get stuck. That's where cloud services play a vital role.
In future releases, the community needs to be improved a lot. We need a better community, and the documentation should be more accurate for the setup process.
Sometimes, we face errors even when following the documentation for server setup and configuration. We need better support.
Even if we raise a ticket, it takes a long time to get addressed, and they don't offer online support. They ask for screenshots, which takes even more time. Instead of direct screensharing or hopping on a call. But it's free, so we can't complain too much.
View full review »DK
Donghan Kim
R&D Head, Big Data Adjunct Professor at SK Communications Co., Ltd.
Apache Hadoop's real-time data processing is weak and is not enough to satisfy our customers, so we may have to pick other products. We are continuously researching other solutions and other vendors.
Another weak point of this solution, technically speaking, is that it's very difficult to run and difficult to smoothly implement. Preparation and integration are important.
The integration of this solution with other data-related products and solutions, and having other functions, e.g. API connectivity, are what I want to see in the next release.
View full review »Tools like Apache Hadoop are knowledge-intensive in nature. Unlike other tools in the market currently, we cannot understand knowledge-intensive products straight away. To use Apache Hadoop, a person needs intensive knowledge, which is something that not everybody can get familiarized with in a straightforward manner. It would be beneficial if navigating through tools like Apache Hadoop is made user-friendly. For non-technical users, if the tool is made easy to navigate, it will be easier to use, and one may not have to depend on experts.
The load optimization capabilities of the product are an area of concern where improvements are required.
The complex setup phase can be made easier in the future.
AM
reviewer1976262
Credit & Fraud Risk Analyst at a financial services firm with 10,001+ employees
In terms of processing speed, I believe that some of this software as well as the Hadoop-linked software can be better. While analyzing massive amounts of data, you also want it to happen quickly. Faster processing speed is definitely an area for improvement.
I am not sure about the cloud's technical aspects, whether there are things that happen in the cloud architecture that essentially make it a little slow, but speed could be one. And, second, the Hadoop-linked programs and Hadoop-linked software that are available could do much more and much better in terms of UI and UX.
I mentioned it definitely, and this is probably the only feature we can improve a little bit because the terminal and coding screen on Hadoop is a little outdated, and it looks like the old C++ bio screen.
If the UI and UX can be improved slightly, I believe it will go a long way toward increasing adoption and effectiveness.
View full review »It requires a great deal of learning curve to understand. The overall Hadoop ecosystem has a large number of sub-products. There is ZooKeeper, and there are a whole lot of other things that are connected. In many cases, their functionalities are overlapping, and for a newcomer or our clients, it is very difficult to decide which of them to buy and which of them they don't really need. They require a consulting organization for it, which is good for organizations such as ours because that's what we do, but it is not easy for the end customers to gain so much knowledge and optimally use it. However, when it comes to power, I have nothing to say. It is really good.
View full review »RC
Randy Chng
Senior Associate at a financial services firm with 10,001+ employees
The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks.
View full review »It could be more user-friendly. Other platforms, such as Cloudera, used for big data, are more user-friendly and presented in a more straightforward way. They are also more flexible than Hadoop. Hadoop's scrollback is not easy to use, either.
View full review »YM
Yevgen Manzhulyanov
CEO at AM-BITS LLC
The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data.
SF
Samuel Feinberg
Analytics Platform Manager at a consultancy with 10,001+ employees
In general, Hadoop has as lot of different component parts to the platform - things like Hive and HBase - and they're all moving somewhat independently and somewhat in parallel. I think as you look to platforms in the cloud or into walled-garden concepts, like Cloudera or Azure, you see that the third-party can make sure all the components work together before they are used for business purposes. That reduces a layer of administration configuration and technical support.
I would like to see more direct integration of visualization applications.
View full review »DM
DulalMali
Data Analytics Practice head at bse
The integration with Apache Hadoop with lots of different techniques within your business can be a challenge.
View full review »Hadoop in and of itself stores data with 3x redundancy and our organization has come to the conclusion that the default 3x results in too much wasted disk space. The user has the ability to change the data replication standard, but I believe that the Hadoop platform could eventually become more efficient in their redundant data replication. It is an organizational preference and nothing that would impede our organization from using it again, but just a small thing I think could be improved.
View full review »Hadoop itself is quite complex, especially if you want it running on a single machine, so to get it set up is a big mission.
It seems that Hadoop is on it's way out and Spark is the way to go. You can run Spark on a single machine and it's easier to setup.
In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency. I don't think that this is viable, but if it is possible, then latency on smaller guide queries for analysis and analytics.
I would like a smaller version that can be run on a local machine. There are installations that do that but are quite difficult, so I would say a smaller version that is easy to install and explore would be an improvement.
View full review »AM
Arul Mani
CEO
Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them.
View full review »YT
Yogesh Thakkar
Business data analyst at RBSG Internet operations
We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues. I think more of the solution needs to be focused around the panel processing and retrieval of data.
View full review »JP
reviewer1384338
Vice President - Finance & IT at a consumer goods company with 1-10 employees
I'm not sure if I have any ideas as to how to improve the product.
Every year, the solution comes out with new features. Spark is one new feature, for example. If they could continue to release new helpful features, it will continue to increase the value of the solution.
The solution could always improve performance. This is a consistent requirement. Whenever you run it, there is always room for improvement in terms of performance.
The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning.
We would prefer it if users didn't just get pushed through to certification-based learning, as certifications are expensive. Maybe if they could arrange it so that the certification was at a lesser cost. The certification cost is currently around $2,500 or thereabout.
View full review »MS
MahalingamShanmugam
Works
We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it.
View full review »The Apache team is doing great job and releasing Hadoop versions much ahead of what we can think about. Every room for improvement is fixed as soon as a version is released by ASF. Currently, Apache Oozie 4.0.1 has some compatibility issues with Hadoop 2.5.2.
View full review »DD
reviewer901065
Partner at a tech services company with 11-50 employees
Hadoop's security could be better.
View full review »MB
reviewer1040328
IT Expert at a comms service provider with 1,001-5,000 employees
The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop.
View full review »YM
Yevgen Manzhulyanov
CEO at AM-BITS LLC
What needs improvement depends on the customer and the use case. The classical Hadoop, for example, we consider an old variant. Most now work with flash data.
There is a very wide application for this solution, but in enterprise companies, if you work with classical BI systems, it would be good to include an additional presentation layer for BI solutions.
There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution.
View full review »MB
reviewer1040328
IT Expert at a comms service provider with 1,001-5,000 employees
We are using HDTM circuit boards, and I worry about the future of this product and compatibility with future releases. It's a concern because, for now, we do not have a clear path to upgrade. The Hadoop product is in version three and we'd like to upgrade to the third version. But as far as I know, it's not a simple thing.
There are a lot of features in this product that are open-source. If something isn't included with the distribution we are not limited. We can take things from the internet and integrate them. As far as I know, we are using Presto which isn't included in HDP (Hortonworks Data Platform) and it works fine. Not everything has to be included in the release. If something is outside of HDP and it works, that is good enough for me. We have the flexibility to incorporate it ourselves.
SS
reviewer1433400
Technical Lead at a government with 201-500 employees
For the visualization tools, we use Apache Hadoop and it is very slow.
It lacks some query language. We have to use Apache Linux. Even so, the query language still has limitations with just a bit of documentation and many of the visualization tools do not have direct connectivity. They need something like BigQuery which is very fast. We need those to be available in the cloud and scalable.
The solution needs to be powerful and offer better availability for gathering queries.
The solution is very expensive.
CB
Chitharanjan Billa
Database/Middleware Consultant (Currently at U.S. Department of Labor) at a tech services company with 51-200 employees
It needs better user interface (UI) functionalities.
GA
reviewer1464630
Founder & CTO at a tech services company with 1-10 employees
I don't have any concerns because each part of Hadoop has its use cases. To date, I haven't implemented a huge product or project using Hadoop, but on the level of POCs, it's fine.
The community of Hadoop is now a cluster, I think there is room for improvement in the ecosystem.
From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.
View full review »We're finding vulnerabilities in running it 24/7. We're experiencing some downtime that affects the data.
It would be good to have more advanced analytics tools.
View full review »Rolling restarts of data nodes need to be done in a way that can be further optimized. Also, I/O operations can be optimized for more performance.
View full review »It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake.
View full review »It could be because the solution is open source, and therefore not funded like bigger companies, but we find the solution runs slow.
The solution isn't as mature as SQL or Oracle and therefore lacks many features.
The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment.
At the beginning, MRs on Hive made me think we should get down to Hadoop MRs to have better control of the data. But later, Hive as a platform upgraded very well. I still think a Spark-type layer on top gives you an edge over having only Hive.
View full review »Buyer's Guide
Apache Hadoop
April 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,847 professionals have used our research since 2012.