Apache Hadoop vs Azure Data Factory comparison

Cancel
You must select at least 2 products to compare!
Apache Logo
2,630 views|2,223 comparisons
89% willing to recommend
Microsoft Logo
8,287 views|6,470 comparisons
91% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Apache Hadoop and Azure Data Factory based on real PeerSpot user reviews.

Find out in this report how the two Cloud Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Apache Hadoop vs. Azure Data Factory Report (Updated: March 2024).
767,667 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The tool's stability is good.""Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done.""Hadoop is extensible — it's elastic.""The scalability of Apache Hadoop is very good.""Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability.""Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial.""The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics.""Hadoop File System is compatible with almost all the query engines."

More Apache Hadoop Pros →

"For me, it was that there are dedicated connectors for different targets or sources, different data sources. For example, there is direct connector to Salesforce, Oracle Service Cloud, etcetera, and that was really helpful.""The data factory agent is quite good and programming or defining the value of jobs, processes, and activities is easy.""The trigger scheduling options are decently robust.""From what we have seen so far, the solution seems very stable.""The solution can scale very easily.""The most important feature is that it can help you do the multi-threading concepts.""The feature I found most helpful in Azure Data Factory is the pipeline feature, including being able to connect to different sources. Azure Data Factory also has built-in security, which is another valuable feature.""The most valuable feature of this solution would be ease of use."

More Azure Data Factory Pros →

Cons
"It needs better user interface (UI) functionalities.""The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning.""The load optimization capabilities of the product are an area of concern where improvements are required.""It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake.""In certain cases, the configurations for dealing with data skewness do not make any sense.""Real-time data processing is weak. This solution is very difficult to run and implement.""The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data.""General installation/dependency issues were there, but were not a major, complex issue. While migrating data from MySQL to Hive, things are a little challenging, but we were able to get through that with support from forums and a little trial and error."

More Apache Hadoop Cons →

"There are limitations when processing more than one GD file.""The need to work more on developing out-of-the-box connectors for other products like Oracle, AWS, and others.""Azure Data Factory should be cheaper to move data to a data center abroad for calamities in case of disasters.""The product's technical support has certain shortcomings, making it an area where improvements are required.""I have not found any real shortcomings within the product.""One area for improvement is documentation. At present, there isn't enough documentation on how to use Azure Data Factory in certain conditions. It would be good to have documentation on the various use cases.""Lacks a decent UI that would give us a view of the kinds of requests that come in.""The solution can be improved by decreasing the warmup time which currently can take up to five minutes."

More Azure Data Factory Cons →

Pricing and Cost Advice
  • "Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
  • "​There are no licensing costs involved, hence money is saved on the software infrastructure​."
  • "This is a low cost and powerful solution."
  • "The price of Apache Hadoop could be less expensive."
  • "If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
  • "We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
  • "The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
  • "We just use the free version."
  • More Apache Hadoop Pricing and Cost Advice →

  • "In terms of licensing costs, we pay somewhere around S14,000 USD per month. There are some additional costs. For example, we would have to subscribe to some additional computing and for elasticity, but they are minimal."
  • "This is a cost-effective solution."
  • "The price you pay is determined by how much you use it."
  • "Understanding the pricing model for Data Factory is quite complex."
  • "I would not say that this product is overly expensive."
  • "The licensing is a pay-as-you-go model, where you pay for what you consume."
  • "Our licensing fees are approximately 15,000 ($150 USD) per month."
  • "The licensing cost is included in the Synapse."
  • More Azure Data Factory Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Cloud Data Warehouse solutions are best for your needs.
    767,667 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:Hadoop File System is compatible with almost all the query engines.
    Top Answer:The tool provides functionalities to deal with data skewness or a diverse set of data. There are some configurations that it usually provides. In certain cases, the configurations for dealing with… more »
    Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.
    Top Answer:Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability and reliability I am looking for, good scalability, and is easy to set up and… more »
    Top Answer:Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load transformations, allowing users to apply transformations either in code by using Power… more »
    Ranking
    5th
    out of 34 in Data Warehouse
    Views
    2,630
    Comparisons
    2,223
    Reviews
    11
    Average Words per Review
    532
    Rating
    8.0
    3rd
    Views
    8,287
    Comparisons
    6,470
    Reviews
    46
    Average Words per Review
    489
    Rating
    8.0
    Comparisons
    Learn More
    Overview
    The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

    Azure Data Factory efficiently manages and integrates data from various sources, enabling seamless movement and transformation across platforms. Its valuable features include seamless integration with Azure services, handling large data volumes, flexible transformation, user-friendly interface, extensive connectors, and scalability. Users have experienced improved team performance, workflow simplification, enhanced collaboration, streamlined processes, and boosted productivity.

    Sample Customers
    Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
    1. Adobe 2. BMW 3. Coca-Cola 4. General Electric 5. Johnson & Johnson 6. LinkedIn 7. Mastercard 8. Nestle 9. Pfizer 10. Samsung 11. Siemens 12. Toyota 13. Unilever 14. Verizon 15. Walmart 16. Accenture 17. American Express 18. AT&T 19. Bank of America 20. Cisco 21. Deloitte 22. ExxonMobil 23. Ford 24. General Motors 25. IBM 26. JPMorgan Chase 27. Microsoft (Azure Data Factory is developed by Microsoft) 28. Oracle 29. Procter & Gamble 30. Salesforce 31. Shell 32. Visa
    Top Industries
    REVIEWERS
    Financial Services Firm38%
    Comms Service Provider25%
    Hospitality Company6%
    Consumer Goods Company6%
    VISITORS READING REVIEWS
    Financial Services Firm27%
    Computer Software Company10%
    Comms Service Provider6%
    University6%
    REVIEWERS
    Computer Software Company34%
    Insurance Company11%
    Manufacturing Company8%
    Financial Services Firm8%
    VISITORS READING REVIEWS
    Computer Software Company13%
    Financial Services Firm13%
    Manufacturing Company8%
    Healthcare Company7%
    Company Size
    REVIEWERS
    Small Business34%
    Midsize Enterprise23%
    Large Enterprise43%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise10%
    Large Enterprise75%
    REVIEWERS
    Small Business29%
    Midsize Enterprise19%
    Large Enterprise52%
    VISITORS READING REVIEWS
    Small Business18%
    Midsize Enterprise13%
    Large Enterprise70%
    Buyer's Guide
    Apache Hadoop vs. Azure Data Factory
    March 2024
    Find out what your peers are saying about Apache Hadoop vs. Azure Data Factory and other solutions. Updated: March 2024.
    767,667 professionals have used our research since 2012.

    Apache Hadoop is ranked 5th in Data Warehouse with 32 reviews while Azure Data Factory is ranked 3rd in Cloud Data Warehouse with 81 reviews. Apache Hadoop is rated 7.8, while Azure Data Factory is rated 8.0. The top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". On the other hand, the top reviewer of Azure Data Factory writes "The data factory agent is quite good but pricing needs to be more transparent". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake, Teradata and BigQuery, whereas Azure Data Factory is most compared with Informatica PowerCenter, Informatica Cloud Data Integration, Alteryx Designer, Snowflake and Microsoft Azure Synapse Analytics. See our Apache Hadoop vs. Azure Data Factory report.

    See our list of best Cloud Data Warehouse vendors.

    We monitor all Cloud Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.