Apache Hadoop Room for Improvement

Syed Afroz Pasha - PeerSpot reviewer
Head Of Data Governance at Alibaba Group

The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with data skewness do not make any sense. We usually have to deal with it using a custom solution.

Spark would deal with such cases efficiently. If Hadoop solves the issues the way Spark does, it can compete with Spark at the same level. Hive is a little slower than Spark. Spark is in-memory and parallel processing. Hive is not in-memory, but it is parallel processing.

View full review »
Miodrag Milojevic - PeerSpot reviewer
Senior Data Archirect at Yettel

Hadoop isn't so problematic. It deals with file storage and maintenance. It is a network of file operations.

The stability of the solution needs improvement.

View full review »
Juliet Hoimonthi - PeerSpot reviewer
Manager at Robi Axiata Limited

What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.

View full review »
Buyer's Guide
Apache Hadoop
April 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,847 professionals have used our research since 2012.
GM
Data Architect at a computer software company with 51-200 employees

The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support. 

And then there's the server issue. You have to create and maintain servers on your own, which can be hectic. Sometimes, the configurations in the documentation don't work, and without a strong community to turn to, you can get stuck. That's where cloud services play a vital role.

In future releases, the community needs to be improved a lot. We need a better community, and the documentation should be more accurate for the setup process.

Sometimes, we face errors even when following the documentation for server setup and configuration. We need better support. 

Even if we raise a ticket, it takes a long time to get addressed, and they don't offer online support. They ask for screenshots, which takes even more time. Instead of direct screensharing or hopping on a call. But it's free, so we can't complain too much.

View full review »
DK
R&D Head, Big Data Adjunct Professor at SK Communications Co., Ltd.

Apache Hadoop's real-time data processing is weak and is not enough to satisfy our customers, so we may have to pick other products. We are continuously researching other solutions and other vendors.

Another weak point of this solution, technically speaking, is that it's very difficult to run and difficult to smoothly implement. Preparation and integration are important.

The integration of this solution with other data-related products and solutions, and having other functions, e.g. API connectivity, are what I want to see in the next release.

View full review »
Anand Viswanath - PeerSpot reviewer
Project Manager at Unimity Solutions

Tools like Apache Hadoop are knowledge-intensive in nature. Unlike other tools in the market currently, we cannot understand knowledge-intensive products straight away. To use Apache Hadoop, a person needs intensive knowledge, which is something that not everybody can get familiarized with in a straightforward manner. It would be beneficial if navigating through tools like Apache Hadoop is made user-friendly. For non-technical users, if the tool is made easy to navigate, it will be easier to use, and one may not have to depend on experts.

The load optimization capabilities of the product are an area of concern where improvements are required.

The complex setup phase can be made easier in the future.

View full review »
AM
Credit & Fraud Risk Analyst at a financial services firm with 10,001+ employees

In terms of processing speed, I believe that some of this software as well as the Hadoop-linked software can be better. While analyzing massive amounts of data, you also want it to happen quickly. Faster processing speed is definitely an area for improvement.

I am not sure about the cloud's technical aspects, whether there are things that happen in the cloud architecture that essentially make it a little slow, but speed could be one. And, second, the Hadoop-linked programs and Hadoop-linked software that are available could do much more and much better in terms of UI and UX.

I mentioned it definitely, and this is probably the only feature we can improve a little bit because the terminal and coding screen on Hadoop is a little outdated, and it looks like the old C++ bio screen. 

If the UI and UX can be improved slightly, I believe it will go a long way toward increasing adoption and effectiveness.

View full review »
Abhik Ray - PeerSpot reviewer
Co-Founder at Quantic

It requires a great deal of learning curve to understand. The overall Hadoop ecosystem has a large number of sub-products. There is ZooKeeper, and there are a whole lot of other things that are connected. In many cases, their functionalities are overlapping, and for a newcomer or our clients, it is very difficult to decide which of them to buy and which of them they don't really need. They require a consulting organization for it, which is good for organizations such as ours because that's what we do, but it is not easy for the end customers to gain so much knowledge and optimally use it. However, when it comes to power, I have nothing to say. It is really good.

View full review »
RC
Senior Associate at a financial services firm with 10,001+ employees

The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks.

View full review »
Aria Amini - PeerSpot reviewer
Data Engineer at Behsazan Mellat

It could be more user-friendly. Other platforms, such as Cloudera, used for big data, are more user-friendly and presented in a more straightforward way. They are also more flexible than Hadoop. Hadoop's scrollback is not easy to use, either.

View full review »
YM
CEO at AM-BITS LLC

The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data.

View full review »
SF
Analytics Platform Manager at a consultancy with 10,001+ employees

In general, Hadoop has as lot of different component parts to the platform - things like Hive and HBase - and they're all moving somewhat independently and somewhat in parallel. I think as you look to platforms in the cloud or into walled-garden concepts, like Cloudera or Azure, you see that the third-party can make sure all the components work together before they are used for business purposes. That reduces a layer of administration configuration and technical support.

I would like to see more direct integration of visualization applications.

View full review »
DM
Data Analytics Practice head at bse

The integration with Apache Hadoop with lots of different techniques within your business can be a challenge.

View full review »
it_user340983 - PeerSpot reviewer
Infrastructure Engineer at Zirous, Inc.

Hadoop in and of itself stores data with 3x redundancy and our organization has come to the conclusion that the default 3x results in too much wasted disk space. The user has the ability to change the data replication standard, but I believe that the Hadoop platform could eventually become more efficient in their redundant data replication. It is an organizational preference and nothing that would impede our organization from using it again, but just a small thing I think could be improved.

View full review »
Lucas Dreyer - PeerSpot reviewer
Data Engineer at BBD

Hadoop itself is quite complex, especially if you want it running on a single machine, so to get it set up is a big mission.

It seems that Hadoop is on it's way out and Spark is the way to go. You can run Spark on a single machine and it's easier to setup.

In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency. I don't think that this is viable, but if it is possible, then latency on smaller guide queries for analysis and analytics.

I would like a smaller version that can be run on a local machine. There are installations that do that but are quite difficult, so I would say a smaller version that is easy to install and explore would be an improvement.

View full review »
AM
CEO

Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them.

View full review »
YT
Business data analyst at RBSG Internet operations

We have plans to increase usage and this is where we've realized that when we have all these clusters and we're running queries and analyzing, we are facing some latency issues. I think more of the solution needs to be focused around the panel processing and retrieval of data. 

View full review »
JP
Vice President - Finance & IT at a consumer goods company with 1-10 employees

I'm not sure if I have any ideas as to how to improve the product.

Every year, the solution comes out with new features. Spark is one new feature, for example. If they could continue to release new helpful features, it will continue to increase the value of the solution.

The solution could always improve performance. This is a consistent requirement. Whenever you run it, there is always room for improvement in terms of performance.

The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning.

We would prefer it if users didn't just get pushed through to certification-based learning, as certifications are expensive. Maybe if they could arrange it so that the certification was at a lesser cost. The certification cost is currently around $2,500 or thereabout. 

View full review »
MS
Works

We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it.

View full review »
it_user265830 - PeerSpot reviewer
Senior Hadoop Engineer with 1,001-5,000 employees

The Apache team is doing great job and releasing Hadoop versions much ahead of what we can think about. Every room for improvement is fixed as soon as a version is released by ASF. Currently, Apache Oozie 4.0.1 has some compatibility issues with Hadoop 2.5.2.

View full review »
DD
Partner at a tech services company with 11-50 employees

Hadoop's security could be better.

View full review »
MB
IT Expert at a comms service provider with 1,001-5,000 employees

The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop.

View full review »
YM
CEO at AM-BITS LLC

What needs improvement depends on the customer and the use case. The classical Hadoop, for example, we consider an old variant. Most now work with flash data.

There is a very wide application for this solution, but in enterprise companies, if you work with classical BI systems, it would be good to include an additional presentation layer for BI solutions.

There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution. 

View full review »
MB
IT Expert at a comms service provider with 1,001-5,000 employees

We are using HDTM circuit boards, and I worry about the future of this product and compatibility with future releases. It's a concern because, for now, we do not have a clear path to upgrade. The Hadoop product is in version three and we'd like to upgrade to the third version. But as far as I know, it's not a simple thing.

There are a lot of features in this product that are open-source. If something isn't included with the distribution we are not limited. We can take things from the internet and integrate them. As far as I know, we are using Presto which isn't included in HDP (Hortonworks Data Platform) and it works fine. Not everything has to be included in the release. If something is outside of HDP and it works, that is good enough for me. We have the flexibility to incorporate it ourselves.

View full review »
SS
Technical Lead at a government with 201-500 employees

For the visualization tools, we use Apache Hadoop and it is very slow.

It lacks some query language. We have to use Apache Linux. Even so, the query language still has limitations with just a bit of documentation and many of the visualization tools do not have direct connectivity. They need something like BigQuery which is very fast. We need those to be available in the cloud and scalable.

The solution needs to be powerful and offer better availability for gathering queries.

The solution is very expensive.

View full review »
CB
Database/Middleware Consultant (Currently at U.S. Department of Labor) at a tech services company with 51-200 employees

It needs better user interface (UI) functionalities.

View full review »
GA
Founder & CTO at a tech services company with 1-10 employees

I don't have any concerns because each part of Hadoop has its use cases. To date, I haven't implemented a huge product or project using Hadoop, but on the level of POCs, it's fine. 

The community of Hadoop is now a cluster, I think there is room for improvement in the ecosystem.

From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective.

View full review »
it_user1093134 - PeerSpot reviewer
Technical Architect at RBSG Internet Operations

We're finding vulnerabilities in running it 24/7. We're experiencing some downtime that affects the data.

It would be good to have more advanced analytics tools.

View full review »
it_user693231 - PeerSpot reviewer
Big Data Engineer at a tech vendor with 5,001-10,000 employees

Rolling restarts of data nodes need to be done in a way that can be further optimized. Also, I/O operations can be optimized for more performance.

View full review »
Abhik Ray - PeerSpot reviewer
Co-Founder at Quantic

It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake.

View full review »
it_user1208307 - PeerSpot reviewer
Practice Lead (BI/ Data Science) at a tech services company with 11-50 employees

It could be because the solution is open source, and therefore not funded like bigger companies, but we find the solution runs slow.

The solution isn't as mature as SQL or Oracle and therefore lacks many features.

The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment.

View full review »
it_user576504 - PeerSpot reviewer
Software Architect at a tech services company with 10,001+ employees

At the beginning, MRs on Hive made me think we should get down to Hadoop MRs to have better control of the data. But later, Hive as a platform upgraded very well. I still think a Spark-type layer on top gives you an edge over having only Hive.

View full review »
Buyer's Guide
Apache Hadoop
April 2024
Learn what your peers think about Apache Hadoop. Get advice and tips from experienced pros sharing their opinions. Updated: April 2024.
767,847 professionals have used our research since 2012.