We performed a comparison between Apache Hadoop and Vertica based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."It's open-source, so it's very cost-effective."
"The most valuable features are powerful tools for ingestion, as data is in multiple systems."
"The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
"The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so."
"Data ingestion: It has rapid speed, if Apache Accumulo is used."
"The performance is pretty good."
"We selected Apache Hadoop because it is not dependent on third-party vendors."
"Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial."
"Vertica enabled us to close large deals. Customers with large data sets had to be migrated from PostgreSQL to Vertica due to performance."
"I appreciate the flexibility offered by Vertica's projections. It allows for modifying the primary projection without altering the tables, which helps to optimize queries without the need to modify the underlying data."
"The Vertica architecture means it can process/ingest data in parallel to reporting and analyzing because of its in-memory Write-Optimized Storage sitting alongside the analytics optimized Read-Optimized Storage."
"We are also opening new areas of business and potential new revenue streams using Vertica's analytic functions, most notably geospatial, where we are able to run billions of comparisons of lat/long point locations against polygon and point/radius locations in seconds. "
"The hardware usage and speed has been the most valuable feature of this solution. It is very fast and has saved us a lot of money."
"DBAs don’t need to add a partition every month/quarter like with other DBs."
"I don't need any special hardware. I can use commodity hardware, which is nice to have in a commercial solution."
"Vertica is a columnar database, this support our developments in analytics, advanced analytics, and ETL process with large sets of data."
"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."
"The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data."
"It needs better user interface (UI) functionalities."
"The solution is very expensive."
"It would be good to have more advanced analytics tools."
"It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake."
"General installation/dependency issues were there, but were not a major, complex issue. While migrating data from MySQL to Hive, things are a little challenging, but we were able to get through that with support from forums and a little trial and error."
"We would like to have more dynamics in merging this machine data with other internal data to make more meaning out of it."
"We are looking for a cheaper deployment for the solution. Although we did a lot of benchmarks, like Redshift. We tried Redshift, it didn't work. It didn't work out for us as well."
"The integration with AI has room for improvement."
"The biggest problem is the cost of cloud deployment."
"Suboptimal projection design causes queries to not scale linearly."
"They could improve the integration and some of the features in the cloud version."
"If you do not utilize the tuning tools like projections, encoding, partitions, and statistics, then performance and scalability will suffer."
"The integration of this solution with ODI could be improved."
"When it is about to reach the maximum storage capacity, it becomes slow."
Apache Hadoop is ranked 5th in Data Warehouse with 11 reviews while Vertica is ranked 4th in Data Warehouse with 10 reviews. Apache Hadoop is rated 7.8, while Vertica is rated 8.4. The top reviewer of Apache Hadoop writes "Has good processing power and speed and is capable of handling large volumes of data and doing online analysis". On the other hand, the top reviewer of Vertica writes "Reliable and feature-rich, with optimization techniques to fine-tune queries for faster report generation". Apache Hadoop is most compared with Microsoft Azure Synapse Analytics, Azure Data Factory, Oracle Exadata, Snowflake and Oracle Big Data Appliance, whereas Vertica is most compared with Snowflake, SQL Server, Amazon Redshift, Teradata and SingleStore. See our Apache Hadoop vs. Vertica report.
See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.
SQreamDB is a GPU DB. It is not suitable for real-time oltp of course.
Cassandra is best suited for OLTP database use cases, when you need a scalable database (instead of SQL server, Postgres)
SQream is a GPU database suited for OLAP purposes. It's the best suite for a very large data warehouse, very large queries needed mass parallel activity since GPU is great in massive parallel workload.
Also, SQream is quite cheap since we need only one server with a GPU card, the best GPU card the better since we will have more CPU activity. It's only for a very big data warehouse, not for small ones.
Your best DB for 40+ TB is Apache Spark, Drill and the Hadoop stack, in the cloud.
Use the public cloud provider's elastic store (S3, Azure BLOB, google drive) and then stand up Apache Spark on a cluster sized to run your queries within 20 minutes. Based on my experience (Azure BLOB store, Databricks, PySpark) you may need around 500 32GB nodes for reading 40 TB of data.
Costs can be contained by running your own clusters but Databricks manage clusters for you.
I would recommend optimizing your 40TB data store into the Databricks delta format after an initial parse.
Morten, the most popular comparisons of SQream can be found here: https://www.itcentralstation.com/products/sqream-db-alternatives-and-competitors
The top ones include Cassandra, MemSQL, MongoDB, and Vertica.