We performed a comparison between Apache Hadoop and Vertica based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."Its integration is Hadoop's best feature because that allows us to support different tools in a big data platform."
"The scalability of Apache Hadoop is very good."
"It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database."
"The most valuable feature is the database."
"The ability to add multiple nodes without any restriction is the solution's most valuable aspect."
"It's open-source, so it's very cost-effective."
"Data ingestion: It has rapid speed, if Apache Accumulo is used."
"Hadoop is designed to be scalable, so I don't think that it has limitations in regards to scalability."
"I like the projection feature, which increases query performance."
"It's the fastest database I have ever tested. That's the most important feature of Vertica."
"Its projections and encoding are excellent tools for tuning large volumes."
"DBAs don’t need to add a partition every month/quarter like with other DBs."
"Allows us to take volumes and process them at a very high speed."
"I appreciate the flexibility offered by Vertica's projections. It allows for modifying the primary projection without altering the tables, which helps to optimize queries without the need to modify the underlying data."
"It maximize cloud economics for mission-critical big data analytical initiatives."
"Vertica has a few features that I like. From an architecture standpoint, they have separated compute and storage. So you have low-cost object storage for primary storage and the ability to have several sub-clusters working off the same ObjectStore. So it provides workload isolation."
"Based on our needs, we would like to see a tool for data visualization and enhanced Ambari for management, plus a pre-built IoT hub/model. These would reduce our efforts and the time needed to prove to a customer that this will help them."
"The price could be better. I think we would use it more, but the company didn't want to pay for it. Hortonworks doesn't exist anymore, and Cloudera killed the free version of Hadoop."
"It requires a great deal of learning curve to understand. The overall Hadoop ecosystem has a large number of sub-products. There is ZooKeeper, and there are a whole lot of other things that are connected. In many cases, their functionalities are overlapping, and for a newcomer or our clients, it is very difficult to decide which of them to buy and which of them they don't really need. They require a consulting organization for it, which is good for organizations such as ours because that's what we do, but it is not easy for the end customers to gain so much knowledge and optimally use it."
"I think more of the solution needs to be focused around the panel processing and retrieval of data."
"The solution is not easy to use. The solution should be easy to use and suitable for almost any case connected with the use of big data or working with large amounts of data."
"The solution is very expensive."
"The integration with Apache Hadoop with lots of different techniques within your business can be a challenge."
"In certain cases, the configurations for dealing with data skewness do not make any sense."
"It should provide a GUI interface for data management and tuning."
"Vertica can improve automation and documentation. Additionally, the solution can be simplified."
"The geospatial functionality could be designed better."
"The biggest problem is the cost of cloud deployment."
"When it is about to reach the maximum storage capacity, it becomes slow."
"If you do not utilize the tuning tools like projections, encoding, partitions, and statistics, then performance and scalability will suffer."
"Documentation has become much better, but can always use some improvement."
"The documentation of Vertica is an area with shortcomings where improvements are required."
Apache Hadoop is ranked 5th in Data Warehouse with 32 reviews while Vertica is ranked 4th in Data Warehouse with 83 reviews. Apache Hadoop is rated 7.8, while Vertica is rated 8.2. The top reviewer of Apache Hadoop writes "A file system for data collection that contains needed information and files". On the other hand, the top reviewer of Vertica writes " A user-friendly tool that needs to improve its documentation part". Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake and Oracle Big Data Appliance, whereas Vertica is most compared with Snowflake, SQL Server, Amazon Redshift, Teradata and SingleStore. See our Apache Hadoop vs. Vertica report.
See our list of best Data Warehouse vendors and best Cloud Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.
SQreamDB is a GPU DB. It is not suitable for real-time oltp of course.
Cassandra is best suited for OLTP database use cases, when you need a scalable database (instead of SQL server, Postgres)
SQream is a GPU database suited for OLAP purposes. It's the best suite for a very large data warehouse, very large queries needed mass parallel activity since GPU is great in massive parallel workload.
Also, SQream is quite cheap since we need only one server with a GPU card, the best GPU card the better since we will have more CPU activity. It's only for a very big data warehouse, not for small ones.
Your best DB for 40+ TB is Apache Spark, Drill and the Hadoop stack, in the cloud.
Use the public cloud provider's elastic store (S3, Azure BLOB, google drive) and then stand up Apache Spark on a cluster sized to run your queries within 20 minutes. Based on my experience (Azure BLOB store, Databricks, PySpark) you may need around 500 32GB nodes for reading 40 TB of data.
Costs can be contained by running your own clusters but Databricks manage clusters for you.
I would recommend optimizing your 40TB data store into the Databricks delta format after an initial parse.
Morten, the most popular comparisons of SQream can be found here: www.itcentralstation.com
The top ones include Cassandra, MemSQL, MongoDB, and Vertica.