BMC TrueSight Operations Management Benefits

ITManager610z9

IT Manager at a manufacturing company with 1,001-5,000 employees

With the service modeling, once we managed to build our import stuff to get our CMD impact models and services into TrueSight, that was a big win. Because once we integrate it with SolarWinds, they will actually be able to see when there's a problem with the plant, and they will know if it is a network problem or a server problem. With the service models, they can actually get right down to the impact of any issue. We're working on some other things to make that easier, like event correlation. So if a network goes out at the plant, they don't need to know that there are problems connecting to 60 servers, rather they've got a problem with the router.

We're currently looking at either consolidating the other monitoring tools that we have around the organization or connecting them for the single-pane-of-glass goodness. We're bringing in data from SolarWinds, we're bringing in data from Oracle's OEM, and we're integrated with an application monitoring desktops. It generates an event and a ticket is cut out to the regional support people. They will go to the desktop and say, "Your disk is in danger of imminent failure. We need to go ahead and clone that guy and replace it before you're down." So we're definitely going with a single pane of glass. In terms of our IT ops management, that means it's getting better. We're trying to be more proactive instead of reactive. We've only been heavily into this for nine or ten months so the actual, long-term impacts aren't measurable yet. We're still baselining where we are at.

The single pane of glass is a big improvement.

There is also the ability to do predictive and corrective, especially for some services which we're monitoring out in the field which are critical to various plant components. It used to be that they would go down and the plant would call. Now we're detecting that they're down, we're restarting them, and we're letting somebody know there's an issue. That's also a big improvement in our manufacturing capabilities. Culturally, it is bringing people together with one place to look and giving them something to talk about when there's an issue. It's bringing IT together. The collaborative and predictive stuff is actually starting to improve.

We're not doing a tremendous amount of preventative stuff yet - unless you count when your disk is three percent from being full and you need to do something before it fills up. We're not using some of the more advanced features of the predictive analytics yet. We are starting to look at some data analytics though. We have a data analytics group which we stood up, a couple of people who are starting to use data analytics to do some things.

It's improving the overall operation, but the impact is going to be measured a little bit later. We've seen some cost deferrals and some cost savings with some support renewals we haven't had to do on some other tools. But we haven't seen the major cost impacts yet. We have spent a lot, but on cost-avoidance for various support tools we have saved close to $1,000,000. In the nine months we've been operational, we've deferred cost on at least two tools. One was about $750,000 and the other was $250,000 for maintenance.

It also helps to maintain the availability of our infrastructure across a hybrid, complex environment. I used to work at FedEx and we're not as environmentally complex as FedEx because we consolidate a lot of stuff on the ERP. But if you throw manufacturing in there, we have pretty much every flavor of platform. As with most deployments, we've got three-tier and four-tier applications. You throw the network and some load-balancers in there and it's fairly complex. If you can use a service model to see exactly what's working and what's not, it really gives you the ability to look at some things.

The solution has also helped to reveal underlying infrastructure issues that affect app performance. Let's say there is a system that is occasionally slow but you don't know why. Then you find out that it was supposed to be configured to use a large number of LDAP servers for authentication but somebody had configured it to one. When you compare the times at which the systems people were having trouble logging on and you look at the CPU and memory usage on your LDAP server, you begin to put things together, without actually analyzing configuration files. You can figure out that the system is configured improperly. When they dig in, they find that it's only talking to one LDAP server. It gives us that kind of diagnostic capability, by looking at everything, and the ability to pin things down.

In terms of root cause analysis, we're still working that through. But mean time to repair is going down because it's becoming much more obvious. Between the events that people are looking at which are prioritized, and the service models which show the actual impacts to the relationships, it's becoming much easier. Depending on the event, it's gone from about four to five hours down to 20 minutes. When it works, it's significant. A lot of it is cultural. When you go from everybody monitoring their own stuff and not talking to anybody else, to everybody looking at the same single pane of glass, and you throw a Service Desk on top of that, which is performing incident management and coordinating some things - between the technology and the culture and the process changes, you're going to see some pretty dramatic improvements.

BMC just did a custom KM for us. Typically, on a given server, we want to know when a drive is three percent. But we've got some mixes of drives, servers which have anywhere from a 100-gig drive to a terabyte drive, and the percentages that we are worried about are not the same. This request came from our SQL group. BMC was able to adjust the alert parameters based upon the size of the logical drives. That was definitely a business innovation. I think that was good for BMC too. Although that's a custom KM which we just deployed, I suspect they will make that part of their standard tool kit.

BMC TrueSight Operations Management Reviews

BMC TrueSight Operations Management Benefits