The most important feature is the distributed, highly-available monitoring clusters available in the enterprise editions. As we run multiple sites around the world and every second of service disruption costs us money, this feature is critical.
The second most valuable feature for us is the extendability of the service checks and event handling (auto-correction).
The third most important feature was the single point of configuration.
Improvements to My Organization
The fully-extensible event handling has enabled us to reduce on-call incidents by more than 90%. Setting up monitoring of a new site now takes a few hours, when it used to take days.
Room for Improvement
The graphing feature needs work, although it has been rewritten in release 5.0 and we have yet to deploy it.
Use of Solution
We started with Opsview community (which no longer exists), so overall we have been using the Opsview platform for five years. It has been our only monitoring system in production for more than three years.
We have not experienced any significant issues. We have had one slave crash in five years, and due to the redundancy, there was no loss of monitoring. We had the master break once, but due to the independence of the monitoring slave clusters, all we lost was the central management. Each slave can be run with its own web interface.
Customer Service and Technical Support
10/10 Technical Support
We previously used a combination of Big Brother, Ganglia, Cacti, a Syslog server, and an in-house monitoring solution. We selected Opsview over its competitors primarily due to the distributed full-redundancy. Second on our list was the replacement of many systems with a single configuration point.
The setup of the product itself was quite simple. A great deal of development was needed to recreate the custom checks that had been performed by our previous in-house monitoring system.
I performed all development and implementation for the company. Since Opsview can use all Nagios checks, there is a huge number of scripts available. You can check out some of my stuff at https://github.com/nguttman/Nagios-Checks.
When it comes to monitoring of a real-time product like VoIP, I don't think in terms of ROI, I think in terms of SLA and sleepless nights. The product has significantly improved our effective SLA, while virtually eliminating the dreaded 2 AM call.
While the product is not perfect, it is better than any other product I have seen or worked with. If you need geographically distributed, highly-available monitoring, this product is great. If you do anything remotely real-time, then you should want your monitoring to be highly available.
Disclosure: I am a real user, and this review is based on my own experience and opinions.
Jan 04 2016