What is our primary use case?
We have our own private data centers that are set around the world. They post our solutions to our customers. We have a NOC that monitors the applications and services of each server. The primary use is of the solution is to trigger incidents and to resolve issues before the customer notices.
How has it helped my organization?
We use Centreon as a base for almost all of our monitoring, and we use it to trigger instances. We work with ServiceNow. We shifted from the open-source, old, unsupported version of Centreon to the new version. We use the built-in plug-in which Centreon has, the monitoring plug-in, the specific component plug-in. We didn't even have to write the check and maintain the check, we were able to use what Centreon had. That's one thing that it improved in our organization.
We have used it from the beginning, so I can't really compare it to anything before. But when we first installed the UI, it allowed us to see the big picture, to understand what's critical and what's not critical, and to build more and more checks, more and more output, and more hosts for it. It's scalable. Centreon allowed us to do it without having to look for another solution.
We have about 10,000 alerts a month coming from Centreon. For us, especially compared to other systems, it gets us the information for a specific alert: What is alerting on the server, what's working or not working. The number of clicks which we need to do to get that information is significantly lower. If you have an alert on server A, in another solution, you have to search for server A, and then search for what's not good and what's good. In Centreon, it takes one or two clicks, one or two transactions, done by the NOC user, to get that information. When you're talking about doing that 10,000 times a month, that's a significant reduction in the amount of work.
It's flexible for infrastructure monitoring. We can write our own checks. It's based on Nagios, and it's fully open-source. We do prefer to use the plug-ins, because then we don't have to maintain them. But we can write anything regarding server level and application help, ourselves. We have the flexibility.
What is most valuable?
When we started using it, our work was based on Nagios completely. What we like about it is that, whereas with Nagios, by design, if you have five or six data centers, you have to open five or six web pages to see what's going on, in Centreon, this is all included in one page, a single site, one dashboard. You don't have to jump from one specific dashboard to the other.
I also really like the filtering capabilities of it. You can easily tell what's critical next to what's okay, the state of the services. It's very easy to get the whole picture quickly.
In terms of the data visualization features, since we're not looking for anything too particular or too complex, it works for us. It's very easy to find exactly what alerts you have. It's very easy to filter by a specific alert. It's very easy to search. It's very easy to configure a specific relation between alerts, to see what's good and what's bad at a given place.
I would compare it to something like Excel, perhaps. Visually, it's very easy to work with. Maybe you can't do things that are too complicated or have some sort of BI, but it has what we're looking for. What we need to understand is: Where is the alert, is there anything else affected, is it clear? And then resolve it as fast as we can. It's a very straightforward, non-complex GUI.
What needs improvement?
There are improvements that they need to make to their API. When we're using different systems and we want to disable monitoring for a specific server, we still can't do that through the API. That's something that's lacking. We have to be creative and think of other ways.
And now that we're looking into switching to the world of containers, which is a different type of monitoring altogether, I hope that they have some sort of scalable solution for it. In a container world, the container is irrelevant. It could just be destroyed and another one can come up in its place. It's about the history, the log, and the service itself; that's what is valuable. That's something that they have to think about, although we're not there yet ourselves.
For how long have I used the solution?
More than five years.
What do I think about the stability of the solution?
The stability is good. The old Centreon, for us, wasn't stable but, again, we're talking about an old system that wasn't supported, that wasn't built on best practices. The current solution is stable for us.
I don't think we have had any availability issues since we installed the new Centreon. The only time we did was when someone was doing work on Centreon on our side. But other than for maintenance, we haven't had any downtime. We have had some slowness, but not a time when the system wasn't available.
It is a very critical system for us, so if there is a problem with Centreon, we do have to deal with it right away, because it's our eyes. I would know if there was some big issue with Centreon.
What do I think about the scalability of the solution?
In terms of the scalability, so far it looks like it's been doing well. We use the best practices that they send us.
We have a problem because we're growing a lot, server-wise, and we have to accommodate the capacity and rearrange it every time. Sometimes the engines are loaded. But it's something that we have to keep watching because it's installed in our servers, not in a cloud. So we have to make sure that the sizing is what it should be.
Maybe another thing that would be helpful would be a way for Centreon to monitor itself, to tell us when we need to add more engines, or when need to add more CPUs - scale up, scale down - based on the Centreon infrastructure. I'm sure they have this in their best practices, but it would be much better if this was part of an actual alert, so we would know, beforehand, and not have to proactively check it every once in a while.
How are customer service and technical support?
Their support is very good, they're very knowledgeable. We do use them quite often and they're very quick to answer and very quick to take over the desktop and to investigate it themselves. They seem to be very technical.
We wish they had 24/7 support just in case, but we have our own design failover, so the chances that the checks aren't going to work in one way or another are very slim.
Which solution did I use previously and why did I switch?
We used the regular Nagios, two or three of them for each of our data centers. I wasn't there at the time the switch was made to Centreon, but I can guess that it was because Centreon is a unified solution. You can now configure checks and do it on one page. With Nagios, at least the old one, you had to have a different site for each data center, so you had to manage three or four things.
How was the initial setup?
The initial setup was pretty easy. It's very similar to Nagios. The initial setup of Centreon is not difficult at all.
We cleaned all our infrastructure and built Centreon from scratch, but we already knew what we were doing. For the deployment of an empty environment, it was very quick. It took a few days. The difficulties were on our side, making our specific checks and fitting them into the plug-ins, but that didn't have to do with Centreon. We had to go back and do some re-engineering. For us, it was easy.
In terms of our strategy for restructuring, we had a lot of checks that were irrelevant, servers that were irrelevant, and checks that weren't written correctly. Our strategy was, first of all, to have the minimum number of checks needed; second, to have a naming convention; and third, wherever possible, to use a Centreon plug-in and not write our own. It took us a while because we had a lot to review. We have a lot of different applications with a lot of different checks. It was more of an in-house project of processes and procedures. We took advantage of the new Centreon to clean up everything and do it right.
What about the implementation team?
We have the skills. We had a consultant from Centreon come in - that was part of the contract - for three days, and he showed us some tricks, some best practices, and answered some questions.
Specifically for us, because we knew what we were doing, I don't really think we got a lot of value from the consultant. But I can tell you, if someone has no clue what's going on with Centreon, the consultant would be very helpful.
What was our ROI?
If we're looking at Centreon and how we managed to integrate it with ServiceNow, if we needed to buy another monitoring tool, that would probably be a cost of $20,000 or $30,000 a year. We didn't have to do that. Our escalation rate from our NOC is very low, it's about two percent, so I have to give Centreon some credit.
What's my experience with pricing, setup cost, and licensing?
I think Centreon's pricing is fair, especially given the criticality of our system. They were cheaper than the other solutions.
I understand Centreon is going to North America now. They were smaller when we got it, and the pricing was fair. It took us a while to get in contact with sales, which was a little weird, but once we did and they knew we were serious, the pricing was fair.
The licensing terms were pretty straightforward. I believe it was based on the number of hosts.
Which other solutions did I evaluate?
The other solutions we tested give you a unified GUI and a platform, like "Nagios as a Service." They all do basically the same thing.
We also had Opsview, after acquiring a company that used it. We took all their checks and migrated them to Centreon, and then we closed Opsview. It was pretty easy to migrate from it - as long as it's Nagios, it's pretty easy. We had to do fixes here and there, but it was something that took a few dev-man days of work. It was not something that was a complicated project. Doing so, this saved us a lot of money. They were paying more for Opsview vs Centreon for about 10 percent of the service. We had a chance to consolidate to Opsview or Centreon, and it was clear that we should consolidate to Centreon.
What other advice do I have?
Take what you have and challenge it. If you're using another system and you decide to move to Centreon, even if your system is similar, don't put your junk on Centreon or any other tool. Go through your processes, go through the system, see what the system is good at, see what it's not so good at, and try to use plug-ins and best practices. Make sure you do an in-house cleaning first. Don't just dump everything on another system and expect it to work.
We've trained a lot of people on Centreon. It was very easy for everyone. It wasn't something that someone specific had to get used to. When we were looking for different solutions - because we ran out of the support for Centreon - we tested Centreon against a few other solutions, and then we understood the advantage of Centreon, especially the GUI.
We already have a system, ServiceNow that does a lot of the reports and consolidates a lot of the incidents for us. We have to do it in one system and we chose that specific system because a lot of other components are relying on it. But, from our perspective, it gives us exactly what we need. I wouldn't need to over-complicate it.
We have around 70 users who use Centreon in one way or another. Ten to 12 are using it daily, one of their main tasks is to go through it. The rest are on-call, escalation. They would go on Centreon, if they get a specific call, to get more information. In terms of their roles, we have the NOC team that uses it, and then we have the Cloud Operations team, which is the second tier of our infrastructure cloud. They use it when they receive escalated incidents. Part of the DevOps team, two or three, uses it to administrate the system. And some of the managers look at it every once in a while to see if there are things that are alerting in a major incident.
Regarding staff for maintaining the solution, it depends. When you have, say, a new product, and you have new service checks and need to connect it to new host templates, that might take some time, but that's a business requirement. When it comes to just maintaining Centreon itself, it's not too much work. It's one of many tools that our DevOps maintain. I don't think they have too much of a headache with it. There are things here and there but it's not something that is very time-consuming.
In terms of how much of the solution we're using, you can always improve it. It's a matter of the time that you have to put into it. Right now, it's giving us enough. We have tried to learn a few things about it. It's a lot work, and we have had to do other things instead. We are happy with the solution, with where we are at the moment. If we had more time we'd seek to improve it, use new features they have. But we haven't had time to work on it. You have to configure it, you have to maintain it, and write processes. That wasn't at the top of our list. We're using Centreon for what we're using it for, and we're using other tools to complete it.
Overall, I would rate Centreon at nine out of ten. They have excellent support, fair pricing for what you get. It's not some sort of machine that does analytics and discovers the servers and these kinds of things. If you want something, arrange a call, talk about it. When they have a new feature they're very excited about it. It's open-source, they're contributing to that and releasing things.
If you're good at something, just stick with it. Don't make any critical changes. If it's working well, don't try to break it, or be something you're not, and reinvent everything. They haven't changed the UI so much, and that's what's good about it. They didn't try to reinvent it or change something. They took what's good about Nagios and added the things that needed to be added.
There's always room for improvement, they're not perfect, that's why I'm not giving them a ten, but they are good. It wasn't just me who decided that we should go on with Centreon. It was myself and three DevOps, and we all came to the same decision, that we should continue with them. Looking back at it, we'd probably do the same. It's just what we need. I just hope that in the future they'll be able to adapt in the world of containers, more complicated monitoring.