What is most valuable?
The most important feature is the distributed, highly-available monitoring clusters available in the enterprise editions. As we run multiple sites around the world and every second of service disruption costs us money, this feature is critical.
The second most valuable feature for us is the extendability of the service checks and event handling (auto-correction).
The third most important feature was the single point of configuration.
How has it helped my organization?
The fully-extensible event handling has enabled us to reduce on-call incidents by more than 90%. Setting up monitoring of a new site now takes a few hours, when it used to take days.
What needs improvement?
The graphing feature needs work, although it has been rewritten in release 5.0 and we have yet to deploy it.
For how long have I used the solution?
We started with Opsview community (which no longer exists), so overall we have been using the Opsview platform for five years. It has been our only monitoring system in production for more than three years.
What do I think about the stability of the solution?
We have not experienced any significant issues. We have had one slave crash in five years, and due to the redundancy, there was no loss of monitoring. We had the master break once, but due to the independence of the monitoring slave clusters, all we lost was the central management. Each slave can be run with its own web interface.
How are customer service and technical support?
10/10 Technical Support
Which solution did I use previously and why did I switch?
We previously used a combination of Big Brother, Ganglia, Cacti, a Syslog server, and an in-house monitoring solution. We selected Opsview over its competitors primarily due to the distributed full-redundancy. Second on our list was the replacement of many systems with a single configuration point.
How was the initial setup?
The setup of the product itself was quite simple. A great deal of development was needed to recreate the custom checks that had been performed by our previous in-house monitoring system.
What about the implementation team?
I performed all development and implementation for the company. Since Opsview can use all Nagios checks, there is a huge number of scripts available. You can check out some of my stuff at https://github.com/nguttman/Nagios-Checks.
What was our ROI?
When it comes to monitoring of a real-time product like VoIP, I don't think in terms of ROI, I think in terms of SLA and sleepless nights. The product has significantly improved our effective SLA, while virtually eliminating the dreaded 2 AM call.
What other advice do I have?
While the product is not perfect, it is better than any other product I have seen or worked with. If you need geographically distributed, highly-available monitoring, this product is great. If you do anything remotely real-time, then you should want your monitoring to be highly available.