BMC TrueSight Capacity Optimization Benefits

Application Performance Mnagement Specialist at a insurance company with 5,001-10,000 employees
On the capacity management side, just within the last month, there have been three systems that we've gone back to review, and we've said, "Oh my gosh, these are way oversubscribed." We've looked at what their max needs are, and we've trimmed down and freed up a bunch of resources. We estimate that saved us close to $100,000 just by right-sizing the systems that were there. That freed up resources so that we didn't have to buy new hardware. So we saved the company money that way. There are countless other examples. In our customer system, we have identified so many performance problems and pointed right down to the process level, whether it was a memory problem, or whether it was a hung-thread problem. These are all things that we've been able to use the tool to get to. We've used it on almost every system in the company, including things like the telephony system. With the vendor that we have - it's notorious now - we have been able to always pinpoint that they have a memory leak that seems to get perpetuated and brought back into their code, after releases. They'll address it and get it fixed, and then we'll get a release and, all of a sudden, that memory leak comes back. But our tools allow us to see those things. And there is the correlation analysis which we already do with the business. We know when we're expecting some kind of campaign to increase business, and we know that our field force is going to be doing a lot more quotes. We will run those analyses to make sure that we have enough capacity on the floor to handle what that expected new load is. It helps maintain the availability of our infrastructure across our hybrid environment. We are collecting on HDI environments now, Nutanix and Exadata. We totally impact availability because we're making sure that we catch anything unusual. One of the features I didn't mention about TrueSight Capacity Optimization is exception reporting, which is something we've successfully enabled. Rather than wait until something is saturated - CPU or memory - we track what's normal. Exception reporting looks at the last 30 days, say, every Monday at 10:00 AM, and it gets a norm. So if this Monday is 50 percent above or below, then it flags it in our exception report. We have caught a number of things long before they could've crashed systems and impacted availability, just through this exception reporting. It's another automated report that runs daily and is reviewed. There are hundreds of things we've caught with the exception reporting - whether it be memory, or CPU-related, or IO-related - before they got so bad that systems were impacted. In terms of the solution helping to reveal underlying infrastructure issues that affect app performance, it identifies whether we have a CPU or memory constraint, which impacts application performance. We have also used the tool very effectively when a chassis has been saturated and has caused some weird IO problems. There are multiple applications and servers having problems, and until we tie them together, that they're in the same chassis - when we stack those servers and the performance data together for that chassis - then we find that "Oh, we have an IO bottleneck." We've used the tool to identify and prove that out to the storage teams. That's definitely more of a hardware or infrastructure limitation that impacts applications. The solution can definitely be used to help identify them. We believe it helps reduce our mean time to remediation. When we're involved and using the solution, we usually get to the bottom of things a lot quicker, probably 50 percent quicker. Sometimes people are having problems and they don't call us, and sometimes it goes on for hours; it never goes on for a day, now. We've proven ourselves to be too valuable. But in the past, they would try to figure stuff out for hours or a day or two and then, finally, somebody would come to us. We help point everybody in the right direction very fast, and I think our data helps reduce that time, certainly by at least 50 percent. These are all real things we do every day. And it's not all. There's more we could be doing.