We performed a comparison between Apache Hadoop and Azure Data Factory based on real PeerSpot user reviews.
Find out in this report how the two Cloud Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
"The best thing about this solution is that it is very powerful and very cheap."
"It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database."
"The scalability of Apache Hadoop is very good."
"The most valuable features are powerful tools for ingestion, as data is in multiple systems."
"Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done."
"It's open-source, so it's very cost-effective."
"High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization."
"The most valuable feature of this solution would be ease of use."
"Data Factory itself is great. It's pretty straightforward. You can easily add sources, join and lookup information, etc. The ease of use is pretty good."
"It is easy to integrate."
"The scalability of the product is impressive."
"The most valuable feature of Azure Data Factory is that it has a good combination of flexibility, fine-tuning, automation, and good monitoring."
"One advantage of Azure Data Factory is that it's fast, unlike SSIS and other on-premise tools. It's also very convenient because it has multiple connectors. The availability of native connectors allows you to connect to several resources to analyze data streams."
"I enjoy the ease of use for the backend JSON generator, the deployment solution, and the template management."
"From my experience so far, the best feature is the ability to copy data to any environment. We have 100 connects and we can connect them to the system and copy the data from its respective system to any environment. That is the best feature."
"What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly."
"It would be good to have more advanced analytics tools."
"Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them."
"The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment."
"The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."
"There is a lack of virtualization and presentation layers, so you can't take it and implement it like a radio solution."
"Real-time data processing is weak. This solution is very difficult to run and implement."
"The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support."
"It can improve from the perspective of active logging. It can provide active logging information."
"Snowflake connectivity was recently added and if the vendor provided some videos on how to create data then that would be helpful."
"There aren't many third-party extensions or plugins available in the solution."
"This solution is currently only useful for basic data movement and file extractions, which we would like to see developed to handle more complex data transformations."
"Data Factory has so many features that it can be a little difficult or confusing to find some settings and configurations. I'm sure there's a way to make it a little easier to navigate."
"There is no built-in function for automatically adding notifications concerning the progress or outline of a pipeline run."
"There is room for improvement primarily in its streaming capabilities. For structured streaming and machine learning model implementation within an ETL process, it lags behind tools like Informatica."
"Data Factory's cost is too high."
Apache Hadoop is ranked 6th in Data Warehouse with 34 reviews while Azure Data Factory is ranked 3rd in Cloud Data Warehouse with 81 reviews. Apache Hadoop is rated 7.8, while Azure Data Factory is rated 8.0. The top reviewer of Apache Hadoop writes "Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge". On the other hand, the top reviewer of Azure Data Factory writes "The data factory agent is quite good but pricing needs to be more transparent". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake, Teradata and BigQuery, whereas Azure Data Factory is most compared with Informatica PowerCenter, Informatica Cloud Data Integration, Alteryx Designer, Snowflake and IBM InfoSphere DataStage. See our Apache Hadoop vs. Azure Data Factory report.
See our list of best Cloud Data Warehouse vendors.
We monitor all Cloud Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.