What is our primary use case?
We have a ticketing system, Remedy OnDemand, a fairly large IT shop, several thousand servers, about 900 people or so working in IT, about one-third of them are doing support in one way or another or having to deal with incidents. So the use case for this tool was to notify teams or individuals that there was an incident in progress that they needed to attend to. Usually, it was for incidents that had the kind of priority that needed immediate attention.
Natively, Remedy will send out an email. But if you need to get somebody's attention because a server is on the brink of falling over, that doesn't cut it.
Our use case was essentially incident notification.
I was there to transition to the tool. I did all the use cases for it and then I handed off the reins of power to my successor.
How has it helped my organization?
For us, having a quick response to urgent events - events that were not necessarily critical but that could become critical if not dealt with urgently - was important for us.
Prior to having a notification system in place, we either had to have an operations person checking all the queues in Remedy or someone subscribing to emails from Remedy and then doing manual call-outs to people at 3 am because a server died.
We had a fairly sophisticated ticket flow. We had a monitoring system with an events co-relation and event management system that would then automatically create incident tickets. The incidents tickets, based on their level of urgency, would then be channeled out through the Everbridge IT alerting platform which would then trigger off escalations based on the urgency of the incident. For example, if there was a P1 incident where the data center was down, it would escalate much more quickly than if there was a P3 issue that you needed to look at quickly to avoid a P1.
If we were to compare no IT alerting to IT alerting of any kind, the latter makes a significant difference. In our case, we used to have real, live operators who would call people out. Now, the operations staff is there just to manage some escalations but it really removes the human from the equation, from the moment of detection to notification.
Before, we'd have a human looking at a console of some kind and that person would then have to look up a contact list to find out who was the owner of the alert, find their number, call them and, if nothing happened, figure it out, and say, "Okay, I've got to escalate." They would then have to call the second person in line, and so on. It was not really a manageable situation. Having an alerting solution connected to our ticketing system made the flow much more effective and really did improve our overall response time and uptime.
What is most valuable?
There are quite a few valuable features. In terms of the general notifications, one of the things that was interesting and good is that you can configure the tool to escalate if no action is taken within a certain time period. That avoids sending off an alert that nobody deals with and where nobody knows that nobody has dealt with it.
You can program in rotations, shifts, and scenarios of different kinds and it allows you to page multiple people, or people in sequence, or a group of people simultaneously.
Another good feature Everbridge has is deduplication. We had cases where everybody on a team had the same phone number. Maybe they were passing a cell phone around. When the tool sees that, it doesn't call the same phone number 15 times. It will call it one time, because it will see, as part of the list of devices and device hours, that it's a duplicate.
Once your users are defined, you can pop up a map and draw a circle on the map and notify everybody within that area. That geo feature is really useful if you have a particular incident where there is a protest on the street, a building on fire, a Hazmat spill. These are all scenarios that I've lived through.
It was crucial at that time to have a solution where one could say, "Let me draw a radius around the impacted building and have everybody in that radius contacted." That was a huge win.
What needs improvement?
The feature that xMatters has that Everbridge doesn't have, or has in a limited way, is a method of funneling some alerts, as an FYI, to other stakeholders who are not necessarily prime actors in an incident. For example, you have a support team that supports critical application X, and you have somebody who is actually the application owner. The application owner normally does not normally get called out in the middle of the night to let him know that his application is down, unless it's super-critical and it's going to stay down. But they would be receiving a copy of the notification that was sent out so they'd know that something happened overnight, or that something is happening right now.
For how long have I used the solution?
Less than one year.
What do I think about the stability of the solution?
It's been performing like a champ. We haven't had any outages. I had lunch with my buddies last week, and there has been nothing significantly wrong. It's been flowing like it should.
The old 2012 solution was using somewhat dated technology and it was starting to choke on a regular basis. We really didn't want that with the volume of incidence tickets that we were generating.
What do I think about the scalability of the solution?
We didn't have any scalability issues with it. I don't have a comparison point, but it easily handled everything we threw at it.
How are customer service and technical support?
Everbridge's tech support was really excellent. They were on the ball, they had answers to our questions. They made things happen that they probably hadn't done beforehand. I found them really collaborative and very much a pleasure to work with.
I found Everbridge to be very responsive during the implementation phase, and post-implementation, whenever we had questions, we were able to reach out either via our managed service provider or directly to Everbridge. As a longtime tech guy - I've got over 30 years in the business - they were really a blast to work with. It's always great to work with people who are competent and who have some kind of empathy for your reality.
I'm not sure if I was dealing with US people, Toronto people, or overseas people. There were a lot of people from different places coming onto phone bridges. At a certain point it was hard to tell who was a managed service provider, who was Remedy, who was Everbridge. It was just quite the multinational effort.
It could have been a real horror story, and it turned out very well. We were starting to have doubts at one point, and then they called in the cavalry. We had a few extra resources. And things went off pretty much without a hitch.
Which solution did I use previously and why did I switch?
Before, we were using xMatters, which is another notification tool, a very old version that was resold to us through a managed service provider. Our xMatters solution was hosted by them and it was at end-of-life. It was the last xMatters on-prem offering back in 2012 or 2013.
When we migrated we looked at different solutions but the Everbridge solution was the most cost-effective at the time. It didn't have, from my perspective, any other clear advantages over xMatters, over PagerDuty.
In our environment it made financial sense and, with the templates, it made operational sense. It worked just fine. It was surprisingly, blazingly fast. The throughput was pretty incredible. The time from when the incident system - the ticketing system - poked Everbridge to say that there was something going on, until Everbridge starting to notify, was very short.
I wasn't even aware that Everbridge was doing an IT alerting product up until last year. I had always known them to be a mass-notification type of company. It was actually a smart move on their part to leverage their mass-notification capability - which, by definition, means you're alerting a whole ton of people in a very short period of time - into an IT alerting product.
In the past, that's where we would run into issues with our on-prem xMatters installation. Sometimes, when there were too many alerts, a lot of queuing would happen. I didn't see any instances while I was there - and we did tests with a lot of events - of much queuing happening on the Everbridge side.
I don't really consider Everbridge to be a relatively new product. Everbridge had an alerting product beforehand. All they did was enhance their alerting product and add functionality required for it to become an IT alerting product. But they started off with a really good base. They managed the transition to an IT-alerting product fairly gracefully.
How was the initial setup?
The setup was straightforward once you understood that it is a different paradigm. When you're used to things being a certain way - if you're used to Windows and you switch to Mac you have a little bit of an adjustment period and then things become intuitive. It was the same here. There's nothing inherently overly-complex about the tool itself. But if you're coming from another tool with a different underlying paradigm, you do have to wrap your head around some different concepts. It took a while to catch on to how to properly use the tool and to convey to Everbridge what exactly we were expecting as a result.
The deployment took about two months.
There were a lot of steps in there including a massive cleanup of the old notification system, so we wouldn't transport garbage into the future, a migration of over 1,000 users, which is quite a bit, all the technical onboarding that had to happen for people, so that they'd know how to use the new tool, exposure to the new functionalities. The training was done simultaneously with the integration of the tool. We had a Dev, a QA, and a Prod environment. We ran it through its paces in all three to make sure it worked out.
The project took longer because the biggest problem was deciding on the tool. But once the tool was decided on, it was about a two-month effort to convert.
The actual technical implementation strategy was really just making sure we were passing the right variables and tweaking templates until they were just so.
What about the implementation team?
We used our managed service provider, and we had people from Everbridge and Remedy directly involved. But we did not have any third-party consultants.
Considering the knowledge of the people who were involved in the implementation from the Everbridge side, the transparency with which they worked with us, and the rapidity of the responses and corrections or modifications or tweaks, it was really a very pleasant experience.
What was our ROI?
It replaced something that was already doing a very similar job, so the ROI is hard to quantify. We already had something that notified people. Compared to having nothing, the ROI would have been substantial.
But let's look at it this way: If you have 1,000 users and you're paying $25 a head, you're paying $25,000 per month. If you have access to metrics on incident management and how much it costs a large organization to deal with a major incident, having a notification tool in place reduced our number of major incidents by about 20 percent, year over year.
It's helpful when you can notify and have solid proof of notification. Then you have accountability. What was particularly interesting was that the gains were seen because people were then able to be notified of things that were urgent but not a P1 yet, still at a pre-impact level. The classic example would be a disk that is filling up. You've got a critical app and if the disk fills up, you're toast. Monitoring picks it up, creates a ticket, dispatches it off to a team, the team gets notified. If nobody responds within 10 or 15 minutes, it gets escalated. So for sure, within half an hour, somebody would look at it. Just doing that greatly reduced the number of disk-space incidents we had.
What's my experience with pricing, setup cost, and licensing?
In terms of additional costs, I was just the guy who was the pain in the back, telling them, "No, we need this functionality. You forgot this. These are the use cases that need to be represented." But apart from the integration costs and, obviously, using resources from Remedy and using resources from Everbridge, regarding licensing costs we just had that flat fee. Once we integrated it was just a standardized fee.
Which other solutions did I evaluate?
Our need was very unsophisticated in the sense that we wanted to notify a predefined set of people based on predefined criteria. Within Everbridge you could accomplish that using something called templates. It had an automated flow-through.
What xMatters has that Everbr201ge e doesn't have is something interesting called a subscription, where you can get an FYI notification of an event or incident based on matching keywords or other elements of the message.
We did a quick market scan and we saw PagerDuty out there, xMatters was out there. I don't remember if there Opsgenie was available at the time. But there were a bunch of them that all seemed to coalesce around the same price point and, for whatever reason, Everbridge came in as less expensive and they did integrations with Remedy OnDemand.
That was good for us because in a large shop with a good flow of incident tickets, for the people who are resolving these things it becomes cumbersome to take notifications, log in, go into the ticketing system and assign the ticket to themselves, and then work on the problem. With the Everbridge integration the person who acts on the alert becomes the owner of the ticket and the ticket changes status. That facilitated the visibility of how the incidents were being handled at the bank.
We also needed device discrimination based on severity of ticket, time discrimination based on the severity of ticket, and impact of ticket. You're not going to page out somebody for a low-level event.
What other advice do I have?
My chief advice would be to know your use cases. A tool like Everbridge can do just about anything. All of these tools are very powerful tools. Start small, pick something that is attainable and that you can measure, and then build from there. Sometimes people try too hard to do everything at the same time, to implement every possible functionality on day one. It never works.
Also, if you have a poorly defined use case you have a problem. The tool itself is good but, while Microsoft Word is a decent tool, it doesn't make me a writer. That's how I see Everbridge. It's a decent tool, but it doesn't mean that it makes you an alerting god if you don't know how you want to use it and how you plan to use it or what your expected results are.
You really have to think through the process, the whole process. We're lucky that our incident management processes were defined. People knew what to expect. I had some very specific use cases. I needed shifts, I needed rotations, I needed device discrimination, depending on the type of alert. I needed targeted escalations. I needed escalations to our NOC for certain types of events. All of these things had to be figured out beforehand. If you discover them as you go along, it impacts the design. If you're designing for a fuzzy need you're going to have a bad time when it comes down to implementation.
In terms of improvement in remediation time, we had already seen that. Our use case was the same use case we had before.
It was the primary means of notification for our ticketing system. In terms of incidents coming from automation, from monitoring, in any given month there would be 6,000 to 10,000 tickets, depending on the month and what happened.
Something to know about these systems is that once they're configured, they're pretty much set-and-forget. After that, it's just add a user, remove a user. It's very rare in our specific use case that we'd have to change a template.
In terms of IT alerting, I'd give Everbridge a solid eight out of ten. I'd give it a nine if the subscription functionality was a bit better. It's lightweight from an end-user perspective. It's not overly busy. It's straightforward in the way it communicates and it's heavily customizable.