Apache Hadoop vs Azure Data Factory comparison

Cancel
You must select at least 2 products to compare!
Apache Logo
2,387 views|2,021 comparisons
87% willing to recommend
Microsoft Logo
7,883 views|6,192 comparisons
91% willing to recommend
Comparison Buyer's Guide
Executive Summary

We performed a comparison between Apache Hadoop and Azure Data Factory based on real PeerSpot user reviews.

Find out in this report how the two Cloud Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI.
To learn more, read our detailed Apache Hadoop vs. Azure Data Factory Report (Updated: May 2024).
772,679 professionals have used our research since 2012.
Featured Review
Quotes From Members
We asked business professionals to review the solutions they use.
Here are some excerpts of what they said:
Pros
"The ability to add multiple nodes without any restriction is the solution's most valuable aspect.""The best thing about this solution is that it is very powerful and very cheap.""It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database.""The scalability of Apache Hadoop is very good.""The most valuable features are powerful tools for ingestion, as data is in multiple systems.""Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done.""It's open-source, so it's very cost-effective.""High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization."

More Apache Hadoop Pros →

"The most valuable feature of this solution would be ease of use.""Data Factory itself is great. It's pretty straightforward. You can easily add sources, join and lookup information, etc. The ease of use is pretty good.""It is easy to integrate.""The scalability of the product is impressive.""The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring.""One advantage of Azure Data Factory is that it's fast, unlike SSIS and other on-premise tools. It's also very convenient because it has multiple connectors. The availability of native connectors allows you to connect to several resources to analyze data streams.""I enjoy the ease of use for the backend JSON generator, the deployment solution, and the template management.""From my experience so far, the best feature is the ability to copy data to any environment. We have 100 connects and we can connect them to the system and copy the data from its respective system to any environment. That is the best feature."

More Azure Data Factory Pros →

Cons
"What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly.""It would be good to have more advanced analytics tools.""Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them.""The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment.""The integration with Apache Hadoop with lots of different techniques within your business can be a challenge.""There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution.""Real-time data processing is weak. This solution is very difficult to run and implement.""The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support."

More Apache Hadoop Cons →

"It can improve from the perspective of active logging. It can provide active logging information.""Snowflake connectivity was recently added and if the vendor provided some videos on how to create data then that would be helpful.""There aren't many third-party extensions or plugins available in the solution.""This solution is currently only useful for basic data movement and file extractions, which we would like to see developed to handle more complex data transformations.""Data Factory has so many features that it can be a little difficult or confusing to find some settings and configurations. I'm sure there's a way to make it a little easier to navigate.""There is no built-in function for automatically adding notifications concerning the progress or outline of a pipeline run.""There is room for improvement primarily in its streaming capabilities. For structured streaming and machine learning model implementation within an ETL process, it lags behind tools like Informatica.""Data Factory's cost is too high."

More Azure Data Factory Cons →

Pricing and Cost Advice
  • "Do take into consider that data storage and compute capacity scale differently and hence purchasing a "boxed" / 'all-in-one" solution (software and hardware) might not be the best idea."
  • "​There are no licensing costs involved, hence money is saved on the software infrastructure​."
  • "This is a low cost and powerful solution."
  • "The price of Apache Hadoop could be less expensive."
  • "If my company can use the cloud version of Apache Hadoop, particularly the cloud storage feature, it would be easier and would cost less because an on-premises deployment has a higher cost during storage, for example, though I don't know exactly how much Apache Hadoop costs."
  • "We don't directly pay for it. Our clients pay for it, and they usually don't complain about the price. So, it is probably acceptable."
  • "The price could be better. Hortonworks no longer exists, and Cloudera killed the free version of Hadoop."
  • "We just use the free version."
  • More Apache Hadoop Pricing and Cost Advice →

  • "In terms of licensing costs, we pay somewhere around S14,000 USD per month. There are some additional costs. For example, we would have to subscribe to some additional computing and for elasticity, but they are minimal."
  • "This is a cost-effective solution."
  • "The price you pay is determined by how much you use it."
  • "Understanding the pricing model for Data Factory is quite complex."
  • "I would not say that this product is overly expensive."
  • "The licensing is a pay-as-you-go model, where you pay for what you consume."
  • "Our licensing fees are approximately 15,000 ($150 USD) per month."
  • "The licensing cost is included in the Synapse."
  • More Azure Data Factory Pricing and Cost Advice →

    report
    Use our free recommendation engine to learn which Cloud Data Warehouse solutions are best for your needs.
    772,679 professionals have used our research since 2012.
    Questions from the Community
    Top Answer:It's primarily open source. You can handle huge data volumes and create your own views, workflows, and tables. I can also use it for real-time data streaming.
    Top Answer:Since it is an open-source product, there won't be much support. So, you have to have deeper knowledge. You need to improvise based on that.
    Top Answer:AWS Glue and Azure Data factory for ELT best performance cloud services.
    Top Answer:Azure Data Factory is flexible, modular, and works well. In terms of cost, it is not too pricey. It offers the stability and reliability I am looking for, good scalability, and is easy to set up and… more »
    Top Answer:Azure Data Factory is a solid product offering many transformation functions; It has pre-load and post-load transformations, allowing users to apply transformations either in code by using Power… more »
    Ranking
    6th
    out of 35 in Data Warehouse
    Views
    2,387
    Comparisons
    2,021
    Reviews
    13
    Average Words per Review
    530
    Rating
    7.8
    3rd
    Views
    7,883
    Comparisons
    6,192
    Reviews
    45
    Average Words per Review
    507
    Rating
    8.0
    Comparisons
    Learn More
    Overview
    The Apache Hadoop project develops open-source software for reliable, scalable, distributed computing. The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures.

    Azure Data Factory efficiently manages and integrates data from various sources, enabling seamless movement and transformation across platforms. Its valuable features include seamless integration with Azure services, handling large data volumes, flexible transformation, user-friendly interface, extensive connectors, and scalability. Users have experienced improved team performance, workflow simplification, enhanced collaboration, streamlined processes, and boosted productivity.

    Sample Customers
    Amazon, Adobe, eBay, Facebook, Google, Hulu, IBM, LinkedIn, Microsoft, Spotify, AOL, Twitter, University of Maryland, Yahoo!, Cornell University Web Lab
    1. Adobe 2. BMW 3. Coca-Cola 4. General Electric 5. Johnson & Johnson 6. LinkedIn 7. Mastercard 8. Nestle 9. Pfizer 10. Samsung 11. Siemens 12. Toyota 13. Unilever 14. Verizon 15. Walmart 16. Accenture 17. American Express 18. AT&T 19. Bank of America 20. Cisco 21. Deloitte 22. ExxonMobil 23. Ford 24. General Motors 25. IBM 26. JPMorgan Chase 27. Microsoft (Azure Data Factory is developed by Microsoft) 28. Oracle 29. Procter & Gamble 30. Salesforce 31. Shell 32. Visa
    Top Industries
    REVIEWERS
    Financial Services Firm35%
    Comms Service Provider24%
    Retailer6%
    Manufacturing Company6%
    VISITORS READING REVIEWS
    Financial Services Firm29%
    Computer Software Company11%
    University6%
    Manufacturing Company5%
    REVIEWERS
    Computer Software Company34%
    Insurance Company11%
    Manufacturing Company8%
    Financial Services Firm8%
    VISITORS READING REVIEWS
    Computer Software Company13%
    Financial Services Firm13%
    Manufacturing Company8%
    Healthcare Company7%
    Company Size
    REVIEWERS
    Small Business33%
    Midsize Enterprise19%
    Large Enterprise47%
    VISITORS READING REVIEWS
    Small Business15%
    Midsize Enterprise11%
    Large Enterprise74%
    REVIEWERS
    Small Business29%
    Midsize Enterprise19%
    Large Enterprise52%
    VISITORS READING REVIEWS
    Small Business18%
    Midsize Enterprise13%
    Large Enterprise69%
    Buyer's Guide
    Apache Hadoop vs. Azure Data Factory
    May 2024
    Find out what your peers are saying about Apache Hadoop vs. Azure Data Factory and other solutions. Updated: May 2024.
    772,679 professionals have used our research since 2012.

    Apache Hadoop is ranked 6th in Data Warehouse with 34 reviews while Azure Data Factory is ranked 3rd in Cloud Data Warehouse with 81 reviews. Apache Hadoop is rated 7.8, while Azure Data Factory is rated 8.0. The top reviewer of Apache Hadoop writes "Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge". On the other hand, the top reviewer of Azure Data Factory writes "The data factory agent is quite good but pricing needs to be more transparent". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake, Teradata and BigQuery, whereas Azure Data Factory is most compared with Informatica PowerCenter, Informatica Cloud Data Integration, Alteryx Designer, Snowflake and IBM InfoSphere DataStage. See our Apache Hadoop vs. Azure Data Factory report.

    See our list of best Cloud Data Warehouse vendors.

    We monitor all Cloud Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.