BMC TrueSight Operations Management Root Cause
How has the solution affected your ability to identify the root cause of operational events or problems?
In terms of root cause analysis, we're still working that through. But mean time to repair is going down because it's becoming much more obvious. Between the events that people are looking at which are prioritized, and the service models which show the actual impacts to the relationships, it's becoming much easier. Depending on the event, it's gone from about four to five hours down to 20 minutes. When it works, it's significant. A lot of it is cultural. When you go from everybody monitoring their own stuff and not talking to anybody else, to everybody looking at the same single pane of glass, and you throw a Service Desk on top of that, which is performing incident management and coordinating some things - between the technology and the culture and the process changes, you're going to see some pretty dramatic improvements.View full review »
As for root cause, when a team is engaged in monitoring to its full extent, we're usually able to get to root cause pretty darn quick. For example, if a team has many servers that could potentially be impacting an application or a business service, tracking something down across those multiple servers and multiple owners could be really tedious and time-consuming. It would be on the order of hours, or at least many minutes, depending on the scope of the issue. With well-implemented monitoring, for our Sev-One apps, they're able to get to the solution almost immediately. If we have monitoring set up properly, the actionable event will tell them precisely where a critical component has failed and they can resolve it. Where it's a different type of incident that we might not have a particular monitor for, they're able to use the performance data, availability data, and other related alerts to get to their issue much faster than they used to. Having a good monitoring implementation has made a world of difference to our operations teams. It's so much so, that if you think back five years, which is an eternity in the IT world, when there was a Sev-One incident back then, someone would walk around tapping people on the shoulder all over the floor. That was very time-consuming. But now they're able to collaborate quickly and say, "It looks like this is the problem right here," in a well-monitored environment, and get right to the root cause.View full review »
We have improved our ability to get to a root cause because of the way their tools work. If you follow it down to the lowest level of the diagram, and a problem happens, it lights up a certain model in red. However, if you go down to the lowest member of the tree, you'll see who is the lowest person. So, if it's a database saying, "I'm out of disk space," then it may create all types of chaos. Following that tree down, you'll see the lowest level is the database server, and it has an event disk space issue. Then, right there, that's the root cause of all your application issues. So, it has helped us get to the root cause more quickly.View full review »
the entire root-cause analysis functionality within the tool is quite useful. It really comes down to how admins want to leverage it. There are what I call "old-school admins" who want to get on the box and solve it themselves. Then you have the "new-school admins" who go straight to the monitoring tools. It clearly shows you root cause analysis: This is the probable cause, and then they're able to go remediate it more quickly. We use that extensively within the operations team and the products team...View full review »