We performed a comparison between Apache Hadoop and Vertica based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."The most valuable features are powerful tools for ingestion, as data is in multiple systems."
"The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable."
"Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges."
"The performance is pretty good."
"Apache Hadoop is crucial in projects that save and retrieve data daily. Its valuable features are scalability and stability. It is easy to integrate with the existing infrastructure."
"The tool's stability is good."
"We selected Apache Hadoop because it is not dependent on third-party vendors."
"High throughput and low latency. We start with data mashing on Hive and finally use this for KPI visualization."
"Vertica is a great product because customers can compress and code data. The infrastructure that data warehouse solutions need is a commodity server so that customers don't have to invest in infrastructure."
"It maximizes cloud economics with Eon Mode by scaling cluster size to meet variable workload demands."
"The most valuable feature of Vertica is the unmatchable database performance."
"The feature of the product that is most important is the speed. I needed a columnar database, and its speed is what it's built to do, and so that's what really does differentiate Vertica from its competitors."
"The most valuable feature of Vertica is the ability to receive large aggregations at a very quick pace. The use case of subclusters is very good."
"Integrated R and geospatial functions are helping us improve efficiency and explore new revenue streams. "
"The product's initial setup phase is extremely simple."
"The fast columnar store database structure allows our query times to be at least 10x faster than on any other database."
"The solution could use a better user interface. It needs a more effective GUI in order to create a better user environment."
"In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency."
"The stability of the solution needs improvement."
"Real-time data processing is weak. This solution is very difficult to run and implement."
"The key shortcoming is its inability to handle queries when there is insufficient memory. This limitation can be bypassed by processing the data in chunks."
"What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly."
"I would like to see more direct integration of visualization applications."
"The solution is very expensive."
"The documentation of Vertica is an area with shortcomings where improvements are required."
"Performance of management of metadata layer (database catalog) needs improvement. We still have to have smaller customers on PostgreSQL; Vertica cannot manage thousands of schemata."
"Vertica's native cloud support could be improved, and its installation could be made easier."
"In a future release, we would like to have artificial intelligence capabilities like neural networks. Customers are demanding this type of analytics."
"It needs integration with multiple clouds."
"We faced some challenges when trying to use the temporary tables feature."
"We are looking for a cheaper deployment for the solution. Although we did a lot of benchmarks, like Redshift. We tried Redshift, it didn't work. It didn't work out for us as well."
"The integration with AI has room for improvement."
Apache Hadoop is ranked 6th in Data Warehouse with 34 reviews while Vertica is ranked 4th in Data Warehouse with 83 reviews. Apache Hadoop is rated 7.8, while Vertica is rated 8.2. The top reviewer of Apache Hadoop writes "Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge". On the other hand, the top reviewer of Vertica writes " A user-friendly tool that needs to improve its documentation part". Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake and SAP IQ, whereas Vertica is most compared with Snowflake, SQL Server, Amazon Redshift, Teradata and BigQuery. See our Apache Hadoop vs. Vertica report.
See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.
SQreamDB is a GPU DB. It is not suitable for real-time oltp of course.
Cassandra is best suited for OLTP database use cases, when you need a scalable database (instead of SQL server, Postgres)
SQream is a GPU database suited for OLAP purposes. It's the best suite for a very large data warehouse, very large queries needed mass parallel activity since GPU is great in massive parallel workload.
Also, SQream is quite cheap since we need only one server with a GPU card, the best GPU card the better since we will have more CPU activity. It's only for a very big data warehouse, not for small ones.
Your best DB for 40+ TB is Apache Spark, Drill and the Hadoop stack, in the cloud.
Use the public cloud provider's elastic store (S3, Azure BLOB, google drive) and then stand up Apache Spark on a cluster sized to run your queries within 20 minutes. Based on my experience (Azure BLOB store, Databricks, PySpark) you may need around 500 32GB nodes for reading 40 TB of data.
Costs can be contained by running your own clusters but Databricks manage clusters for you.
I would recommend optimizing your 40TB data store into the Databricks delta format after an initial parse.
Morten, the most popular comparisons of SQream can be found here: www.itcentralstation.com
The top ones include Cassandra, MemSQL, MongoDB, and Vertica.