We performed a comparison between Apache Hadoop and Vertica based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform."
"As compared to Hive on MapReduce, Impala on MPP returns results of SQL queries in a fairly short amount of time, and is relatively fast when reading data into other platforms like R."
"Apache Hadoop can manage large amounts and volumes of data with relative ease, which is a feature that is beneficial."
"Initially, with RDBMS alone, we had a lot of work and few servers running on-premise and on cloud for the PoC and incubation. With the use of Hadoop and ecosystem components and tools, and managing it in Amazon EC2, we have created a Big Data "lab" which helps us to centralize all our work and solutions into a single repository. This has cut down the time in terms of maintenance, development and, especially, data processing challenges."
"What comes with the standard setup is what we mostly use, but Ambari is the most important."
"The most important feature is its ability to handle large volumes. Some of our customers have really large volumes, and it is capable of handling their data in terms of the core volume and daily incremental volume. So, its processing power and speed are most valuable."
"The solution is easy to expand. We haven't seen any issues with it in that sense. We've added 10 servers, and we've added two nodes. We've been expanding since we started using it since we started out so small. Companies that need to scale shouldn't have a problem doing so."
"The most valuable feature is the database."
"We are also opening new areas of business and potential new revenue streams using Vertica's analytic functions, most notably geospatial, where we are able to run billions of comparisons of lat/long point locations against polygon and point/radius locations in seconds. "
"The most valuable feature of Vertica is the ability to receive large aggregations at a very quick pace. The use case of subclusters is very good."
"Vertica is a columnar database, this support our developments in analytics, advanced analytics, and ETL process with large sets of data."
"It maximize cloud economics for mission-critical big data analytical initiatives."
"Its projections and encoding are excellent tools for tuning large volumes."
"It's the fastest database I have ever tested. That's the most important feature of Vertica."
"The hardware usage and speed has been the most valuable feature of this solution. It is very fast and has saved us a lot of money."
"Allows us to take volumes and process them at a very high speed."
"It would be helpful to have more information on how to best apply this solution to smaller organizations, with less data, and grow the data lake."
"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."
"It could be more user-friendly."
"It requires a great deal of learning curve to understand. The overall Hadoop ecosystem has a large number of sub-products. There is ZooKeeper, and there are a whole lot of other things that are connected. In many cases, their functionalities are overlapping, and for a newcomer or our clients, it is very difficult to decide which of them to buy and which of them they don't really need. They require a consulting organization for it, which is good for organizations such as ours because that's what we do, but it is not easy for the end customers to gain so much knowledge and optimally use it."
"General installation/dependency issues were there, but were not a major, complex issue. While migrating data from MySQL to Hive, things are a little challenging, but we were able to get through that with support from forums and a little trial and error."
"What could be improved in Apache Hadoop is its user-friendliness. It's not that user-friendly, but maybe it's because I'm new to it. Sometimes it feels so tough to use, but it could be because of two aspects: one is my incompetency, for example, I don't know about all the features of Apache Hadoop, or maybe it's because of the limitations of the platform. For example, my team is maintaining the business glossary in Apache Atlas, but if you want to change any settings at the GUI level, an advanced level of coding or programming needs to be done in the back end, so it's not user-friendly."
"It needs better user interface (UI) functionalities."
"It would be good to have more advanced analytics tools."
"Some of our small to medium-sized customers would like to see containerization and flexibility from the deployment standpoint."
"I would personally like to see extended developer tooling suited to Vertica – think published PowerDesigner SQL dialect support."
"The geospatial functionality could be designed better."
"The integration with AI has room for improvement."
"Limitations in group by projections is where I would like to see an improvement."
"The integration of this solution with ODI could be improved."
"It would be great if this were a managed service in AWS."
"They could improve the integration and some of the features in the cloud version."
Apache Hadoop is ranked 5th in Data Warehouse with 32 reviews while Vertica is ranked 4th in Data Warehouse with 83 reviews. Apache Hadoop is rated 7.8, while Vertica is rated 8.2. The top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". On the other hand, the top reviewer of Vertica writes " A user-friendly tool that needs to improve its documentation part". Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake and Oracle Big Data Appliance, whereas Vertica is most compared with Snowflake, SQL Server, Amazon Redshift, Teradata and SingleStore. See our Apache Hadoop vs. Vertica report.
See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.
SQreamDB is a GPU DB. It is not suitable for real-time oltp of course.
Cassandra is best suited for OLTP database use cases, when you need a scalable database (instead of SQL server, Postgres)
SQream is a GPU database suited for OLAP purposes. It's the best suite for a very large data warehouse, very large queries needed mass parallel activity since GPU is great in massive parallel workload.
Also, SQream is quite cheap since we need only one server with a GPU card, the best GPU card the better since we will have more CPU activity. It's only for a very big data warehouse, not for small ones.
Your best DB for 40+ TB is Apache Spark, Drill and the Hadoop stack, in the cloud.
Use the public cloud provider's elastic store (S3, Azure BLOB, google drive) and then stand up Apache Spark on a cluster sized to run your queries within 20 minutes. Based on my experience (Azure BLOB store, Databricks, PySpark) you may need around 500 32GB nodes for reading 40 TB of data.
Costs can be contained by running your own clusters but Databricks manage clusters for you.
I would recommend optimizing your 40TB data store into the Databricks delta format after an initial parse.
Morten, the most popular comparisons of SQream can be found here: www.itcentralstation.com
The top ones include Cassandra, MemSQL, MongoDB, and Vertica.