The multi-tenancy that's available through this is one of the most valuable things. We have a whole lot of people who want to do their own monitoring and we are able to offer it up as a service to them, manage the infrastructure side of everything for them, and then give them their own little slice of this. They can take advantage of the distributed components that we have around the world and let them set up their own hosts. That's one of the big features that we really like about it. That's the number one thing.
We tried to find something else and we really can't find something that has that multi-tenancy that's easy to use, especially from the end user's perspective.
Improvements to My Organization
We now have the ability to offer that out as a service. People come to us and say we need to monitor this test bed. We're able to give that to them ,which in the past would have required me or my colleague to work with them and implement that monitoring and then support it going forward too. There was no self-service there at all and we'd always get engaged. It's allowed us to be able to focus on other things while this is providing that service. It's given us some time back, which would be the biggest benefit.
Room for Improvement
There is one feature that I've been requesting for a while now. Going back to this multi-tenancy thing, right now the tenants can't create their own service checks, so that's still a process they have to get with us. Once we get a check in place, they can then utilize that check across all of their hosts. It's like a one-time thing to set that up with them. It would be nice if they could even implement their own service checks and there was some way to introduce that into the distributed system from a tenant perspective, not just a global admin perspective. That's one feature that I think is missing and I've mentioned it quite a few times to the guys over in their ops unit.
Another thing that we thought would be kind of neat, would be if there was some kind of integrated logging service. We've got this distributed system already in place and I know it's a monitoring service, but it would be kind of cool if there was something that could catch syslogs. It could have a module in there to view the syslogs from all of the different sites and use that as a syslog aggregate or something along those lines. It would be kind of like what Splunk does. If we could get that functionality inside of here, because we've already got all of these things around the world and if we could just leverage that to do this, then that would be kind of neat too.
Use of Solution
We've used it for about two years.
For the most part it’s stable. We had some issues when I took the product over from an engineer who left and we were dealing with scalability. We had to address how it was architected. Opsview was pretty good about getting with me and helping me to come up with a plan and correct that. When we rolled out that solution, it's been pretty rock solid ever since. We haven't had many stability issues.
Every once in a while, we'll hit some kind of weird, wonky bug or something like that and we'll get with them, and either there's a fix or there will be a fix and some update or something along those lines.
We had a scalability problem about a year-and-a-half ago where the number of monitored hosts was growing and the database was not able to keep up. But going forward, I don't see any problems with scalability. The downside of continuing to add more sites and distributed slave components at those sites is that reload times increase. I have to continue to use Nagios to prevent this, but it would be nice if it didn't have to be this way.
As we scale, we could set up slave clusters, which has worked.
Customer Service and Technical Support
Nine times out of ten, you put the ticket in, you get with somebody who's very knowledgeable and is able to help. I think maybe once or twice there's been a ticket where we didn't get the attention. But I think that's probably going to happen anywhere.
Overall, I'd say it's been very good and they've been very responsive. Right now, we're going through an upgrade process that requires a big migration. I put a ticket in and they contacted me within 30 minutes. It was not the exact resolution that I needed, but at least they started the conversation.
We have to set it up again because we're migrating and upgrading, which is complex with many moving pieces. We're moving from v4 to v5 and I have to learn the differences and the underlying components.
Without prior experience, setup would be pretty complex. v5, however, offers an auto-installation function, making new installations a whole lot easier. The problem is that we have existing employees still using v4, so the auto-installation doesn't work for them, and we need to get down into the nitty gritty.
As long as you have your expectations on what the system is, then it'll probably meet those expectations. But if you want it to do things that are beyond what it's designed to do, you might need to look at something else.