We performed a comparison between Apache Hadoop and Teradata based on real PeerSpot user reviews.
Find out in this report how the two Data Warehouse solutions compare in terms of features, pricing, service and support, easy of deployment, and ROI."It is a file system for data collection. There are nodes in this cluster that contain all the information, directories, and other files. The nodes are based on the MySQL database."
"We selected Apache Hadoop because it is not dependent on third-party vendors."
"What comes with the standard setup is what we mostly use, but Ambari is the most important."
"Most valuable features are HDFS and Kafka: Ingestion of huge volumes and variety of unstructured/semi-structured data is feasible, and it helps us to quickly onboard a new Big Data analytics prospect."
"The most valuable feature is the database."
"It's open-source, so it's very cost-effective."
"Since both Apache Hadoop and Amazon EC2 are elastic in nature, we can scale and expand on demand for a specific PoC, and scale down when it's done."
"As compared to Hive on MapReduce, Impala on MPP returns results of SQL queries in a fairly short amount of time, and is relatively fast when reading data into other platforms like R."
"Teradata can be deployed on-premise, on the cloud, or in a virtual machine, which means customers can move without having to create their architecture all over again."
"It's very, very fast"
"Viewpoint, the detailed query logs and performance statistics are valuable features."
"It's a pre-configured appliance that requires very little in terms of setting-up."
"Teradata solutions help organizations reduce IT, operations, and maintenance costs; enhance on-time delivery of products and services."
"Teradata is a great, industry-leading data warehousing product that has MPP architecture."
"I found all parts --loading, transformation, processing & querying work in parallel, and end-to-end-- to be valuable."
"Teradata's best feature is its speed with historical data."
"The upgrade path should be improved because it is not as easy as it should be."
"The solution needs a better tutorial. There are only documents available currently. There's a lot of YouTube videos available. However, in terms of learning, we didn't have great success trying to learn that way. There needs to be better self-paced learning."
"In the next release, I would like to see Hive more responsive for smaller queries and to reduce the latency."
"The main thing is the lack of community support. If you want to implement a new API or create a new file system, you won't find easy support."
"From the Apache perspective or the open-source community, they need to add more capabilities to make life easier from a configuration and deployment perspective."
"It would be good to have more advanced analytics tools."
"General installation/dependency issues were there, but were not a major, complex issue. While migrating data from MySQL to Hive, things are a little challenging, but we were able to get through that with support from forums and a little trial and error."
"The stability of the solution needs improvement."
"The following could be better: licensing, architecture openness, integration with other tools."
"The increasing volumes of data demand more and more performance."
"It's primarily designed for big projects and therefore, the pricing is pretty high. It's not suitable for smaller companies."
"Teradata should focus on functionality for building predictive models because, in that regard, it can definitely improve."
"Data ingestion is done via external utilities and not by the query language itself. It would be more convenient to have that functionality within its SQL dialect."
"I've been using the same UI for 20 years in Teradata. It could use some updating. Adding more stability around Teradata Studio would be outstanding. Teradata Studio is a Java-based version of their tool. It's much better now, but it still has some room for improvement."
"Teradata is an expensive tool. Like, if you're already using Microsoft products like Windows, they'll market all their products together. And with the rise of cloud technologies, companies will adopt solutions that offer them some privileges or facilities. Similar to how SAP does it in the market, so do Microsoft and other companies. Even Oracle and other such tools are quite commonly seen compared to Teradata's competitors in everyday solutions."
"Teradata's pricing is quite high compared to Redshift, Synapse, or GCP alternatives."
Apache Hadoop is ranked 5th in Data Warehouse with 33 reviews while Teradata is ranked 3rd in Data Warehouse with 54 reviews. Apache Hadoop is rated 7.8, while Teradata is rated 8.2. The top reviewer of Apache Hadoop writes "Handles huge data volumes and create your own workflows and tables but you need to have deeper knowledge". On the other hand, the top reviewer of Teradata writes "Offers seamless integration capabilities and performance optimization features, including extensive indexing and advanced tuning capabilities". Apache Hadoop is most compared with Azure Data Factory, Microsoft Azure Synapse Analytics, Oracle Exadata, Snowflake and BigQuery, whereas Teradata is most compared with SQL Server, Snowflake, Oracle Exadata, MySQL and Teradata IntelliFlex. See our Apache Hadoop vs. Teradata report.
See our list of best Data Warehouse vendors.
We monitor all Data Warehouse reviews to prevent fraudulent reviews and keep review quality high. We do not post reviews by company employees or direct competitors. We validate each review for authenticity via cross-reference with LinkedIn, and personal follow-up with the reviewer when necessary.
Hi Tomasz,
Collibra can scan all these sources. See this link: https://marketplace.collibra.c...
Also, Erwin Data Intelligence Suite can harvest most (if not all) of these sources:
https://www.erwin.com/products...
Hi Tomasz Rabong,
I believe that if you have a developer team in Amundsen it would be possible.
Alternatively, you can look at Informatica EDC or at Data Virtualization Data Catalog (from Denodo).
@Tomasz Rabong, it depends upon the actual requirements of the data catalog.
As far as we have experienced SAP BO 4.0 is way ahead in solving architectural, clustering, warehousing and mining complex problems whereas Tableau server 2022.1 is really awesome and has recently included features to solve complex problems.
As a team, we prefer SAP BO for billions of data.
Hi @Tomasz Rabong, I hope you're well and safe.
Specifically, if you need any help regarding Infogix Data360 Govern, please let me know.
Cheers.