ARULES() function is the fastest implementation of the associations algorithm (a priori or tree) I have worked with. In comparison with the R associations package, it is +1000x faster. The issue with the R implementation, and R in general, is that the functions do not work on large sets of data. R would generate a 100,000 x 100,000 element matrix for some of the arules runs we do – very, very inefficient programming.
Improvements to My Organization
Any in db machine, learning functions help by eliminating the data transport issue. NZA goes one better by allowing you to utilize a tuned engine (software + hardware) to run the functions. The result is that we can run algorithms that would be impossible to run on other platforms.
Room for Improvement
We are using an older version. Some improvements are already implemented, such as the addition of neural networks. Other improvement would be to make the existing algorithms even faster, e.g. kmeans can actually be parallelized. If this reworking of the algorithm is done, it should leverage the MPP architecture of Netzza and run much faster.
Also, some algorithms are faster than almost any other method or system e.g. NZA.ARULES() function. Whereas others are not as fast and could probably be optimized for an MPP architecture like kmeans function.
Use of Solution
2-3 years on and off. 1 year for actual integration into another tool for mass consumption.
For deployment, ensure you follow the instructions. The version of Netezza must be matched with the version of NZA. We tried to install a newer version of NZA not supported by the version of Netezza we had and it did not work.
Customer Service and Technical Support
N/A. We did not engage IBM tech support, mainly due to past experiences where it is faster, easier and much less stressful to just figure things out yourself.
Rapidminer, R, DB2 intelligent miner, SSAS data mining were all tested for association rules. Rapidminer and R did not pass. They use matrix math, and cause +100x performance degradations. DB2 intelligent miner, SSAS data mining, and likely also NZA follows the original a prori methods of using sets, not matrices.
DBA set it up, but it seemed very simple. I helped with designing the overall architecture. If you stick with NZA as a “function repository DB” concept and do not put data on it (as recommended in the install manual) you will be ok.
In-house. Don’t over think it. Just apply some base architectural principals and give it a try. If you find something not working make adjustments.
Pricing, Setup Cost and Licensing
I had nNo visibility to pricing. Netezza in general though, is very cost effective. If you compare to other MPP platforms like Hadoop. Hadoop requires a very “non-enterprise” culture to implement and a team of designers / engineers. Netezza requires maybe 1-2 admins.
I would have to understand the specifics of the organization, the data size, the tasks, etc. to make any recommendations.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Jan 17 2016