CA UIM (DX Infrastructure Manager) Review

There is flexibility in the SDKs to customize it. Topology discovery and root cause analysis would be nice to have.


What is most valuable?

The main feature for us is its flexibility with their message bus and their API to make it do what you need it to do, since everyone's different. There is flexibility in the SDKs to customize it.

How has it helped my organization?

It really depends on where you're coming from. In 2009, we were working with Nagios -- before it was UIM and called Nimbus -- and weren't particularly unhappy, but there was an executive decision to go in a different direction. We were out-of-date and weren't taking advantage of some of the new features to see whether they would make a different for us. There were new capabilities, such as analytics and machine baselines versus static thresholds.

That said, it does provide us with a reduction in signal noise levels. It gives an alarm when there's something going on, not just when there's an expected spike that happens every night on a server.

What needs improvement?

Although this may not work based on our environment, but topology discovery and root cause analysis would be nice to have. Right now, we don't have the RCA and rootcon topology awareness. It may be in the new version, but based on our architecture, it may not work. It would be a big win, however, if we had it.

Another useful feature to have would be automatic configuration per standard by new robots that check in for any particular customer. This could help us decrease the configuration time.

For how long have I used the solution?

We've had the same version since the install in 2009. We're looking to upgrade, and we do have the latest version in our lab, but I'm anxious to have it available in prime time.

What was my experience with deployment of the solution?

We've had no issues with deployment.

What do I think about the stability of the solution?

We've had some issues that may to do with versioning, though not completely. In our backend, the database structure and message bus are on the really old version, though the hub is the newest version. There's a point when new features on the hub may no longer be possible. This may be where our version is hurting us.

Sometimes our hubs get choked up and support has never been able to isolate the cause.

We do have times where the hubs get choked up and we've never been able to isolate why with support. Is it something in our environment or is it something they see from other customers? Is it hubs that are too busy? Is it our REX infrastructure? We've never been able to isolate the cause. I've had several support cases over the years about a scenario where the hub gets into a partially functioning state and so all the robots have realized it's not working normally and have moved over to their backup hub. That hub itself still expects to hear from all those robots and so we'll get a flood of hundreds of alarms saying, "Robot inactive. These robots are not checking into me." It's really that they're just checking into the other hub.

That's the issue -- there's no intelligence at that layer. And because of that, one of our most common alarm floods is from the hub itself.

I had an escalation one time to double check that the hub failed-over okay and was back online because they got a hundred tickets opened all at the same time. That's the main point that we've had in terms of instability, is on the hub. We have hubs at other sites that don't have as many robots or aren't doing as many ping checks and they have much fewer issues. It could be that some of these hubs are just too busy and they're more likely to get choked up.

There's also the issue of portal performance. We have UMP released and it's not awful for our customers. If a customer logs in, from a security stand point, they're only seeing their data. If they have 10 servers that we manage for them, the performance isn't awful in that scenario. As an internal employee, when we log in and we have the permission to see all of our data from thousands of devices, the performance is a lot slower and a lot more painful and that's something that we're several versions behind on the portal.

What do I think about the scalability of the solution?

We've had no issues with scalability.

How is customer service and technical support?

We've had some concerns, especially since CA's acquisition and re-branding of Nimsoft. For a while, there was a dedicated support center just for the monitoring product. But now, there's a more standardized support structure where Tier 1 is not as specialized. I haven't, however, had a lot of cases to claim that it's worse than before, but we have had tickets that have dragged on for a long time.

There was an instance where they fixed a bug after 6-8 months, but it was the wrong bug. There are a couple of threads on their forum about comical support interactions where they get told, "Oh, that's an enhancement. Go type it and we'll vote on whether to fix it. Go type it out on the forum." I don't think that's always the experience in every case, but we have had some challenges like that, where it's like, "How are you calling this an enhancement? This is just basic core functionality that's not working" and getting agreement on that. At times that's been a challenge.

What was our ROI?

When we first implemented Nimbus in 2009, it wasn't fully vetted by the technical staff because management pushed it on them. For what we pay, I know many executives don't think we're getting enough ROI. We doing basic monitoring -- CPU memory, disk space, SQL responses, URL's, pings, and custom probes we've written using their SDK. Writing our own probes is one of the perks with something like Nagios.

The licensing cost is several hundred thousand dollars a year, and we're only getting several hundred dollars' worth of value since we're doing basic stuff. That's the challenge.

And we're hamstrung because we're still using the older version, and we're not getting great ROI. There's a lack of clarity on where we want to go, but we could do a whole lot more than we're doing.

What other advice do I have?

There are things that are nice in just covering the basics for us, but then we have pain points on some of the more advanced stuff.

Disclosure: IT Central Station contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
1 visitor found this review helpful
Add a Comment
Guest
Sign Up with Email