What is most valuable?
The most valuable features we're taking advantage of today is computer memory disc monitoring and the alerting on it. We're able to predict how close we're approaching our thresholds so we can head off a disaster and we can troubleshoot it before it becomes a big problem.
How has it helped my organization?
We have the solution across all the production servers in our operation. We have seen, over the last 4 or 5 years, about a 30% decrease in escalations in crisis management, less severity events because we are trending and tracking against thresholds so we get early warning alerts. Our goal is trigger solutions in escalation resolution before it becomes a critical event.
We monitor SQL databases, we have a lot of servers; Windows servers, we have Linux boxes and we have network equipment.
What needs improvement?
One of the things we'd like to see is a more streamline and baseline reporting mechanism. We use several CA products and we'd like to see all of the products dump information into a common format so that we can harvest it into multiple dashboards. Right now if you use multiple applications, you need 3 different experts on 3 different reporting structures. We'd like to see them come with a unified database and ability to harvest that data.
What do I think about the stability of the solution?
We haven't had any issues with downtime with the solution at all. We do have, at times, the robots which are probes that log onto server. Sometimes they'll hang or fall offline and we generally have an auto-restart if that happens. Most the time we find out we caused it ourselves because somebody was performing maintenance.
What do I think about the scalability of the solution?
We've been able to scale it across 20 platforms in 3 different data centers. It doesn't mean it's simple, but once you've got your thresholds down and your methodology, your strategy of what you want to monitor, it works pretty well.
How are customer service and technical support?
We've used technical support before, especially when we first loaded, our server installed our design. Very responsive, stuck with us on the phone till we resolved our issues and at some points they had to come back a few days later with a solution or a patch to fix us.
Which solution did I use previously and why did I switch?
We were having system outages, or server outages, or connectivity outages with the network and we weren't able to see it. The tools we had in place weren't robust enough and weren't flexible enough for us to design thresholds and different levels of monitoring. We started researching tools and we decided on UIM.
How was the initial setup?
I think the initial setup was pretty straightforward. It was a little more complex than we thought but it wasn't insurmountable. The biggest challenge we had was that we didn't understand how our applications ran or how our hardware was responding to our applications so we set the thresholds pretty low, generated a lot of alerts, and then had to adjust. That was probably the biggest challenge we had going into the project
Which other solutions did I evaluate?
We chose CA primarily because of the size of the company and past relationships at other companies I have worked for. Also, we looked at vendors for many different products but we chose CA because of the the flexibility of the product and the supportability of the product.
What other advice do I have?
When selecting a vendor we are first and foremost looking for a partner. We're not interested in a vendor/client relationship. We're not interested in just being a dollar sign at the end of the quarter. We want somebody who will work to understand our business and understand what's unique about us. I'm sure that's a common thread with many customers but it's really important for us to have a partner relationship. The second thing is we want serviceability. We want to be able to call tech support, or talk to a systems engineer, and have them engage with us and work with us through a problem, not just throw us canned solutions and assume we're going to apply those and walk away.
I'd rate it a 9/10. First of all, I don't know that I ever reach a 10 with any vendor, but a 9 because the solution works as advertised, the service is there. The responsiveness of the tech support is very, very pleasing. They come back to you when they schedule, they follow up on their commitments. We've had some challenges expanding our footprint in other data centers. Like I said earlier, it's not perfectly easy, it is complex but once you get it dialed in we're up and running and everything's smooth. Their service teams have been there with us all the way, so that's really important for us.
When it comes to advice to others, I think you should focus on having an understanding of what you want to measure and monitor in your environment. It's more than just saying, "Yeah, we're going to monitor all the servers." What thresholds? What do you expect your CPU utilization to be? What do you expect your memory utilization to be? What's important for you from a customer service responsiveness? Do you have a systems engineer who's willing to put the time in to understand your business before providing you a solution? Those things are really key for us.