What is our primary use case?
Enterprise reporting data warehouse using Business Objects, Microstrategy and data mining using SQL. Being a data repository for a single customer view. Also contained staging tables, some of which were designed like an ODS and contained all data from the source system and was updated on a nightly basis. The applicance contained over 12 TB of data uncompressed (less than 4 TB compressed).
How has it helped my organization?
Reports which used to take between 30 to 60 minutes or would time out on an Oracle database, which was previously used for the enterprise DWH, now run consistently in seconds or in less than five minutes.
What is most valuable?
High performance RDBMS appliance optimized for data warehousing and enterprise reporting. Very simple to manage huge volumes of data without having to worry about indexing and partitioning. Automated compression of tables without any custom scripting or manual intervention. Achieved almost 3x compression effortlessly which meant that 12 TB of data compressed into around 4 TB.
What needs improvement?
Could do better to support more concurrent update queries. We had to stagger our ETL loads to prevent queuing of jobs and random failures.
Also, it would have been good if the admin application showed more detail on the validity and usage of zone maps (this may have been implemented in later versions of the admin app).
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
The database runs stable unless there are hundreds of queries running in parallel.
What do I think about the scalability of the solution?
Every query is a full table scan. If the table contains mostly integers, then performance is good. If the number of users is in the thousands, then it may be better to use cubes or other solutions to service reporting needs.
How is customer service and technical support?
Before being acquired by IBM, Netezza corporation had exceptional support and used to respond very quickly (less than 30 minutes) in case of production issues. Round the clock support and monitoring were offered and support tickets were handed over very professionally between engineers working across time zones. After being acquired by IBM, support has not been as responsive, but there weren't as many issues as the box was stable.
Which solutions did we use previously?
Previously, Oracle was used as the data warehousing platform, and performance was low and not meeting the needs of the enterprise reporting and analytic user community. My customer switched to Netezza mainly for performance, and it was a big improvement.
How was the initial setup?
As the box was very heavy, datacenter flooring required additional reinforcement. The box runs Linux and the initial setup is quite straightforward. ODBC drivers on the servers (ETL or reporting) which connect to the box may need to be upgraded.
What about the implementation team?
Implemented this through a vendor team. As there is no need to spend time on partitioning and indexing, a lot of vendor time was saved. Table scripts for partitioned oracle tables run into hundreds or thousands of lines of code and we used to be charged accordingly. But a Netezza table script is much much simpler and we saved money there. Review of table scripts for performance and best practice was also easier as there is only a limited set of best practices to be implemented for high performance. So even vendor teams having low or medium level of expertise can deliver properly as long as they understand how MPP works - governance effort is definitely lesser with Netezza compared to Oracle or SQL Server.
What was our ROI?
ROI is high because analyst productivity improved drastically. As mentioned before, queries which used to run for several minutes now run in seconds or less than a few minutes or the duration of a typical pop song. So analysts can ask more questions of the data per hour compared to Oracle.
Also, the compression feature saved us a lot of money on per terabyte costs for the data.
What's my experience with pricing, setup cost, and licensing?
From a cost per terabyte perspective, Netezza is definitely more expensive compared to Hive on Hadoop, but due to its simplicity and ANSI SQL Compliance and high performance which can be achieved with less tuning, it may be worth it.
Which other solutions did I evaluate?
My customer upgraded from Netezza 4.x to Twinfin 6.x.
What other advice do I have?
Netezza is a great option for data warehousing, but give due attention to concurrency and find out how much would be the peak load the database may have to handle. Also, check whether performance is acceptable for APIs and web services. Performance may not scale for thousands of single row lookups, as the database is more suited for complex aggregated data warehousing queries.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Jan 30 2018