We performed a comparison between Apache Hadoop and Microsoft Parallel Data Warehouse based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges."
"As compared to Hive on MapReduce, Impala on MPP returns results of SQL queries in a fairly short amount of time, and is relatively fast when reading data into other platforms like R."
"Most valuable features are HDFS and Kafka: Ingestion of huge volumes and variety of unstructured/semi-structured data is feasible, and it helps us to quickly onboard a new Big Data analytics prospect."
"The most valuable features are powerful tools for ingestion, as data is in multiple systems."
"The most valuable feature is scalability and the possibility to work with major information and open source capability."
"The most valuable features are the ability to process the machine data at a high speed, and to add structure to our data so that we can generate relevant analytics."
"Data ingestion: It has rapid speed, if Apache Accumulo is used."
"I liked that Apache Hadoop was powerful, had a lot of tools, and the fact that it was free and community-developed."
"The most valuable feature is the business intelligence (BI) part of it."
"It handles high volumes of data very well."
"The solution's integration is good."
"I am very satisfied with the customer service/technical support."
"We have complete control over our data."
"Microsoft Parallel Data Warehouse integrates beautifully with other Microsoft ecosystem products."
"The most valuable features are the performance and usability."
"I like Data Warehouse's data integrity features. Data integrity is what databases are made for as opposed to spreadsheets."
"Real-time data processing is weak. This solution is very difficult to run and implement."
"We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it."
"I mentioned it definitely, and this is probably the only feature we can improve a little bit because the terminal and coding screen on Hadoop is a little outdated, and it looks like the old C++ bio screen. If the UI and UX can be improved slightly, I believe it will go a long way toward increasing adoption and effectiveness."
"It would be good to have more advanced analytics tools."
"The upgrade path should be improved because it is not as easy as it should be."
"The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks."
"The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."
"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."
"This solution would be improved with an option for in-memory data analysis."
"The reporting for certain types of data needs to be improved."
"The query is slow if we don't optimize it."
"They need to incorporate a machine learning engine."
"I would like the ability to do more real-time type updates instead of batch-oriented updates."
"In the future I would love to see a slightly better automation engine, just for the data integration layer, to make it slightly easier for end-users or junior developers to get involved in incremental updating."
"Concurrent queries are limited to 32, making it more of a data storage mechanism instead of an active DWH solution."
"We find the cost of the solution to be a little high."
More Microsoft Parallel Data Warehouse Pricing and Cost Advice →
Apache Hadoop is ranked 5th in Data Warehouse with 11 reviews while Microsoft Parallel Data Warehouse is ranked 8th in Data Warehouse with 12 reviews. Apache Hadoop is rated 7.8, while Microsoft Parallel Data Warehouse is rated 7.6. The top reviewer of Apache Hadoop writes "Has good processing power and speed and is capable of handling large volumes of data and doing online analysis". On the other hand, the top reviewer of Microsoft Parallel Data Warehouse writes "User-friendly UI and good support". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Azure Data Factory, Oracle Exadata, Snowflake and Teradata, whereas Microsoft Parallel Data Warehouse is most compared with Microsoft Azure Synapse Analytics, Oracle Exadata, SAP BW4HANA, Snowflake and VMware Tanzu Greenplum. See our Apache Hadoop vs. Microsoft Parallel Data Warehouse report.
See our list of best Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.