The most valuable aspects of this product are the integration pieces. It is pretty much limitless, and I also enjoy the reporting aspects as I use this product to assist in RCA’s.
The most valuable feature of the product is how they allow you to do the scheduling compared to the other companies. I switched from PagerDuty to VictorOps because the time that it would take for me to create a full-on schedule for a single item would be anywhere from 45 minutes to an hour and a half in PagerDuty. That same schedule takes me no more than ten minutes in VictorOps.
That’s because with VictorOps, you’re in a single-pane window to create that calendar or that schedule, compared to PagerDuty where, once you set up a single session, you have to go back and set up another session, and that's your escalation policy. The escalation policies in VictorOps are: you have a drop-down menu where you add a task in that same window where you see all of your people and the times you're going to roll over the on-call; you're in the same screen the entire time. That's not the same as it is in PagerDuty, where you have to go to three separate screens to do the same function; it just takes so much longer.
Also, the transmogrification feature was really awesome. We use that quite heavily, so we can make sure the messages coming in are getting properly formatted. We're able to add whatever little customization that we want to for that type of message, so that the drops can accept it, and then give us valuable feedback based upon that.
Then, another great feature, because we were part of their beta system, is being able to do the calls and start a call within the timeline. That was awesome. Once you start a conference bridge, it would show the users and you notified the users within VictorOps, which was awesome. You could see in the timeline, for RCA purposes, who joined the call, when they joined the call, and if they typed out anything into the timeline concerning that incident that took place. We use it, I currently use it, for RCA tracking purposes. Any incident that takes place, we tag an incident number to it, and then we tell everybody that's associated with that incident to make sure that they put all of their findings related to that incident number and then we print out a report at the end as our RCA.
Improvements to My Organization
We are using this product to streamline RCA processes. We make all our engineers enter in notes to a specific monitor we have designated as the issue and then do cross-team collaboration to add notes to that problem. Later we can run reports on the entire timeline of the event so we aren’t having to do it twice. I can upload the logs to management for them to review what took place and who were involved in troubleshooting.
Obviously Net TGA and Net TGR are huge in the DevOps community, so one thing that we like to do is see what our NTGA and NTGR are per route, and try to figure out where we have gaps and why we have those types of gaps, whether we have an employee that simply just doesn't answer his on-call, which goes against your NTPA, and then ultimately against your NTPR. Then the reports, obviously for the RCA, our upper management expects to see those media files when issues arise.
Room for Improvement
Because we're in the beta program, we've submitted quite a bit of changes that we'd like for them to do. They actually included us in that process. The only feature that we are currently waiting on right now is really kind of an annoyance factor: when we get on the timeline on the main dashboard, there’s an ACK All button. It should not have the ability to ACK others' alerts, but only your own. I have sent several emails to their DevOps group and support about it.
It's something we've brought up twice with them now. After the second time, they really took it seriously. From what I heard from the product team, we weren't the only ones to complain about that. Now that other customers are saying something about it, they've really said, "Okay yeah, we need to do something about this." And they are, and I believe that they will.
Everything they've ever said that they're going to do, they've done it.
Use of Solution
I believe it has been a year now.
There were no issues with the deployment.
There were some issues with stability that were quickly addressed and haven’t arisen since. It's been stable as long as we've used it. I think they've only had one time that they said, "Hey, we need to do a maintenance," and it lasted maybe five minutes. It wasn't even enough for us to even notice.
I have not encountered any scalability issues, because, really, that's on their side not ours, but we need to scale as users add answers, all I have to do is just add users; they just bill me appropriately. It's done. They've made it entirely too easy.
We have separate call centers; we kept 41 on our side and I believe there's a little over 100 on the corporate IT side of the organization. So I would say about 150 users total.
Customer Service and Technical Support
Technical support is an A+ or a 10/10. If I could give an 11/10, I would. It is what I love about them. I got a call from their support team out of the blue on a Saturday afternoon, and they basically said "Hey, we've been seeing some strange activity on your account, some strange stuff is happening. Wanted to bring that to your attention, is that expected behavior?" It was something we were doing, we were working on something internally, and they caught it. That tells me that they were actively monitoring those systems. They called us up to say, "Hey, is everything okay? Anything we can help with to help you all do this?" I was like, holy crap, this is good customer service.
It probably took me 30-45 minutes to set up all aspects of the alerting. Not complex at all, which is why we chose VictorOps. It was straightforward; it's literally point-and-click. Set up the users, make sure that they get the notification, make sure that they log in and set up their individual profiles. Then from there, it was just building the schedules. Like mentioned elsewhere, I built out the team's schedules that had been in PagerDuty, and it took me 30 minutes. It would've taken me almost all day in PagerDuty to do that.
We used an in-house team. Take your time and think of all possible avenues of how can get alerts. Literally anything that can send an email can be alerted on and “transmogrified” to fit your needs.
Other Solutions Considered
We were a customer of PagerDuty and we chose VictorOps because of the ease of administration. It takes me five minutes to perform an analysis.
Before choosing VictrOps, we also looked at another company, but the name evades me. Because we were already on PagerDuty, we looked at VictorOps, did an in-depth POC for about 30 days, and fell in love with it and pretty much was done.
Play with it, break it down, try to break it as much as you can. Actually do limited production systems, have production systems alerting to it. Play around with the scheduling pieces as compared to the competitors. It's a no-brainer. Everybody that I've ever talked to, the company that has offered me a job, they're actually using VictorOps. I asked, "Why? Why are you using VictorOps?" He said, "Because it's just so damn easy."
I haven’t given it five stars because nobody's perfect. It better be laying golden eggs if it's going to get five stars.