BMC TrueSight Operations Management Review

Monitors a mix of on-prem and cloud, and predictive alerts help maintain availability


What is our primary use case?

TrueSight Operations Manager includes infrastructure monitoring, as well as application performance monitoring. The premier use case that I have seen, over the last few years is infrastructure monitoring, along with network monitoring. The overall use case is monitoring of IT infrastructure, including the network; monitoring, alerting, and event management.

Occasionally we have seen a couple of customers who are interested in the application performance management as well.

The actionable alerts that we get from monitoring the infrastructure or application are the end-result of the monitoring. Most of our customers are interested in those alerts and in having a ticket created out of the alerts in their ITSM solution.

I have deployed this solution, along with other BMC solutions, for many customers across multiple verticals, like healthcare, banking, and telecom. I have done eight to 10 implementation projects of TrueSight. Our company sells BMC Software solutions and we implement, develop, and support them.

How has it helped my organization?

In a project that we're working on for a telecom company, at the time we started implementing TrueSight Operations Management, the number of alerts or events, and subsequently the number of tickets from those events, was really high. After applying the intelligent thresholds in TrueSight, and doing all the event management to correlate the related alarms and deduplicating of the alarms, and suppressing unwanted alarms, we have been able to reduce the number of events, and hence the number of tickets, by almost 60 percent.

Before that, their data center NOC team was overwhelmed with the number of tickets and the number of events. By applying the intelligent thresholds, which are called signature thresholds in TrueSight, we have been able to reduce the noise and the false negatives and even false positives. We have been able to give them only the most important actionable alarms and tickets. This has freed up a lot of time for productivity for the network operations team. They have been able to focus on different things, along with their regular stuff.

It also helps maintain the availability of infrastructure across a hybrid or complex environment. Because TrueSight can monitor network devices, databases, storage, cloud environments, and a mix of on-prem and cloud, our solutions keep checking the availability of all the devices in the infrastructure and they alert you when there is an issue. So it definitely helps in maintaining the availability. You can also configure predictive alerts or intelligent thresholds or predictive thresholds. Using them, TrueSight will try to give you an alert before something goes wrong. It will look at the threshold and it will look at the trending data for a particular metric, and before that threshold is crossed, it will give you a predictive alert saying that this threshold may be crossed in the next 15 minutes or 30 minutes. So it helps maintain the availability of your environment.

In addition, it helps to reveal underlying infrastructure issues that affect application performance, if you're monitoring an application using TrueSight APM. You can monitor an application and record the important transactions in the application that you're interested in. That is called synthetic monitoring. For example, on a banking site, the user login could be the transaction you record.

The app visibility part discovers the application automatically, and it can even monitor at the code level. For example, if there is something wrong in a transaction, maybe on the HTTP response or at the Java or .NET code level, it can indicate where the problem may be in the application. TrueSight also has Probable Cause Analysis. If you are monitoring your IT infrastructure completely, it can correlate the alerts and give you the most probable cause of a particular alert. Again, this can help you figure out the underlying issues in the environment.

The TrueSight solution has built-in intelligence. It uses its analytical engine, an AI engine, to look at the performance data for anything that it's monitoring and it creates a baseline of the performance. Then, it gives you abnormality alerts based on the baseline. Even if your threshold is not crossed, but the baseline of that metric is crossed, it will intelligently give you an alert saying that this metric is trending above the baseline. There may be a case where the static threshold has been set too high, but TrueSight has the intelligent analytical engine that can analyze the trend or the baseline, and then give you an intelligent alert. The Probable Cause Analysis uses the analytics engine to figure out what the probable cause may be for a particular alert. BMC is making good progress in terms of AI.

Mean time to remediation is related to the Probable Cause Analysis and integration with some other components like orchestration or executing a remote action. It definitely helps in reducing the mean time to remediate, but it depends on the expertise of the administrator of TrueSight. In my current assignment we have implemented TrueSight for a large customer in the Middle East, and we have quantified how much we have reduced the meantime to remediate. For the top-priority incidents, we have reduced the MTTR from 12 hours to 1.5 hours.

One of the most prominent features and values of the solution is that it helps to reduce IT operations costs. If you are using Operations Management and TrueSight Capacity, you can get a real picture of how much your IT assets are utilized, and how much of their capacity is saturated or underutilized. It gives you a very clear picture of your entire IT infrastructure, including your network devices and your cloud infrastructure. Your entire infrastructure is monitored and optimized for capacity, and that helps you save costs in your IT operations. I would estimate savings of 20 to 30 percent. I haven't calculated it myself. There are much higher numbers claimed by BMC.

What is most valuable?

The event management part of TrueSight Operations Management, in my experience, is probably the best in the market. You have endless flexibility. You can build your own rules, you have the MRL language, and you can implement any kind of logic on the alerts. It may be correlation, abstraction, or executing something as a result of the alerts. You have almost the whole range of options available for event management using the available customization. I've seen a couple of other solutions, like IBM's and HPE's for event management, and TrueSight Operations Management is far superior to them in event management.

The breadth of the solution's monitoring capabilities is a major selling point for the solution because it is incomparable. You can monitor almost any kind of server, all types of storage, network devices, databases, and even do application monitoring. You also have the option to develop your own Knowledge Module. If something that you want to monitor is not available, you can build your own Knowledge Module to monitor whatever you need. We also have cloud monitoring solutions, which are doing pretty well now. We have AWS, Microsoft Azure, Google Cloud, and container monitoring. The breadth covered by BMC for monitoring of IT infrastructure is really extensive. That breadth of monitoring is really valuable because we can cover almost any monitoring use case that customers come up with.

Also, the end-to-end, automatic ticketing — from generating an alert or an event, to doing event management, and then creating a ticket from the event, as well as automatic closure of the ticket or the event from the ticket — this whole end-to-end flow, is a major selling point. Most of our customers who have on-premise ITSM solutions use BMC Remedy. It is the most popular on-prem solution for ITSM. When customers have Remedy ITSM, it becomes a really good decision to use TrueSight Operations Management, and to use the out-of-the-box integration between the two solutions. That way, the ticketing is done automatically from the event and vice-versa.

In addition, the solution provides a single pane of glass where you can ingest data and events from many technologies. That's one of the major selling points that BMC is pitching for TrueSight Operations Management. You can monitor everything: servers, networks, databases, and your applications. You can also implement capacity optimization and the Presentation Server has a single console, a view and dashboards, where you can see everything in one place.

Previously, BMC called TrueSight a "manager of managers" because TrueSight can be integrated with almost every other monitoring and ticketing tool. For example, in my current project, we have integrated at least 20 other monitoring and alerting systems with TrueSight, and all the other systems are sending their events or alerts to TrueSight. Then, in TrueSight, we are doing the event management to reduce the noise, and filter out unwanted alerts, and get only the required alerts. Even for other integrations, TrueSight acts as a single pane of glass, where you have all these disparate systems. You can integrate all of them with TrueSight and get all the events and alerts in a single window.

What needs improvement?

In terms of root cause analysis, BMC TrueSight has a couple of modules like Service Impact Management and the Probable Cause Analysis, which work together to help you identify related events. This module, on paper, has a lot of promise, but it is actually really complicated. There are really small pieces working together and you have to have a lot of expertise to get any value out of the root cause analysis piece of the solution. For that reason, most of the customers don't really get much value out of the root cause analysis part of TrueSight.

There are other areas with room for improvement as well. For example, the monitoring part requires four or five different types of agents to monitor different things in your infrastructure, which makes things very complicated.

In addition, to implement the Operations Management solution alone, you need a lot of hardware; a lot of servers and a lot of hardware resources. If you compare it with other solutions in the market, like Dynatrace or AppDynamics, the implementation of those products can be done using notably fewer servers. If you want to set up a standalone TrueSight Operations Management for a customer, you need at least 10 servers to implement Infrastructure Management and Application Performance Management. To do the same implementation for Dynatrace or AppDynamics or SolarWinds you only need three or four servers maximum, for the same environment. So the number of resources required for implementation is very much on the higher side.

The complexity of the solution is, again, a challenge. There are so many different components that it becomes almost a nightmare for the operations teams to do the administration and apply hotfixes, patches, and to do daily operations for the solution.

It's too complex, too many servers are required, there are too many different components in the solution, and a lot of agents are required.

Apart from that, some of the intelligence features could also be enhanced. For example, the AI part of TrueSight Operations Management should be enhanced to compete with other products in the market.

For how long have I used the solution?

I've been using BMC TrueSight Operations Management for the last nine years, approximately.

What do I think about the stability of the solution?

Once the solution is deployed and the fine-tuning recommendations are in place, the solution is very stable. In my current environment we haven't seen any issue whatsoever in the last year. We have at least 20 servers running various TrueSight components, and none of them has had any issues in that time. So in that time the availability has been 100 percent and it has been 100 percent stable.

What do I think about the scalability of the solution?

It does scale well, but my concern with the solution is that when you want to scale it up the complexity increases. That is mainly because of the number of different components or software pieces that work together.

The multitenancy mode of TrueSight has a lot of room for improvement. It's like if you have a building and there are many apartments in it, you can have multiple tenants in the same building. If you want to add a tenant, you just give them an apartment in the same building. But with TrueSight, to set up multitenancy, you have to set up separate "buildings" altogether, instead of compartmentalizing into "apartments," which makes everything much more complex.

How are customer service and technical support?

I have been using BMC support for many years. Generally speaking, support is very good and, comparatively, it is much better than the competitors' support departments. But over the past couple of years, the technical expertise of the support team has consistently gone down. 

Generally, the response from BMC support is excellent. You get a response almost immediately. And if the support team is unable to resolve your issue, then they coordinate with their development or customer engineering team very quickly, which is the best part.
If you are trying to get technical support from Microsoft, for example, if the support team is unable to resolve your problem, it can take months to get to a higher level in the support hierarchy. And reaching the development team of the solution is almost unimaginable. But with BMC, this is one of the best parts. If your issue is not resolved by a support team within a stipulated time period, they immediately reach out to their development team and they usually fix the problem.

How was the initial setup?

The initial deployment depends on the customer environment. If the environment is small or medium, the solution can be deployed fairly quickly, and similarly if the customer wants to deploy a standalone setup. But for a large customer, especially for customers who want to deploy the solution in a clustered environment, in a high-availability environment, or even in a DR environment, it's very complex to set up initially and it takes a fairly large amount of time to implement.

The initial setup means setting up the components, setting up the basic monitoring. The advanced configurations take extra time. For a small or medium environment, we can do the initial setup in a couple of weeks. A small to medium environment is where they are monitoring between 50 and 300 or 400 servers and IT infrastructure components, such as storage devices or hardware.

If you go above a few hundred devices, it becomes a large environment. For a large environment, it may take anywhere between two and four months to set up, depending on what kind of deployment the customer prefers: whether they want high availability, a  clustered setup, or a disaster recovery setup.

We do have standardized deployment configurations for customers and we recommend that customers use them. We are BMC's most prominent partner in the Middle East, so we have done quite a few deployments and we have created standard templates for deployment, for small, medium, and large customers. Generally, the customers leave it to us to decide the implementation strategy and then we use our standard deployment template for the given environment, and that makes things much smoother and faster. We already know which component to install when, what configuration should be done, and how much time it should take, ideally. And tasks can be initiated in parallel, like agent installations.

What's my experience with pricing, setup cost, and licensing?

I would advise that you really give a lot of thought to how much you want to monitor and what the anticipated growth in monitoring requirements will be. These things should be considered in the planning phase and, accordingly, you should decide what type of environment to set up.

The licensing depends on the data streams and the event streams. If you are monitoring all the metrics for the monitored devices, the data streams and event streams will increase multifold as well. Therefore, filtering is very important in TrueSight. If you are monitoring the memory utilization for a server, for example, that alone has 20-plus attributes in TrueSight. If you let in all 20 attributes, the number of data streams will increase. If you're really interested only in the utilization metric, you may also be monitoring 19 metrics that you are not interested in and they will add to the data stream and the licensing cost will increase.

Consider scalability very carefully: how much you want to monitor and what components are very important. Then, depending on these two things, filter out unwanted metrics or attributes. If you do a good job at filtering the data, then your licensing costs will be manageable.

I'm not aware of the details of the licensing models of TrueSight's competitors, but our business team says that the cost of using TrueSight is higher compared to its competitors. But that often comes down to the filtering and the sizing. The filtering has to be done very carefully to bring down the licensing costs. 

The licensing module is good and fairly self-explanatory. It's not very complex.

There are different pieces which are licensed separately. For example, Service Impact Management and Application Performance Management are licensed separately. Large customers buy the entire solution with all the features but they don't necessarily use all the features, especially the Service Impact Management. The latter is very difficult to implement and to get value out of. My advice is to consider what features of the solution you are going to use and then just pay for those features, instead of paying for everything without even using it.

Which other solutions did I evaluate?

Without naming particular competitors, I can give you general pros and cons of TrueSight Operations Management, when compared with them.

One of the pros of TrueSight Operations Management is the breadth of the IT infrastructure monitoring capabilities. TSOM can actually monitor any component of your IT infrastructure, along with your applications. It does very deep-dive monitoring and you have many more metrics, compared to any other solution, as far as I'm aware. It gives you more in-depth diagnostics and performance data. 

Also, the support from BMC software is better than its competitors. 

The complexity of implementing TSOM — the number of components required to set it up and the number of servers you need — is one of the cons. And the number of different agents you need to monitor different things is another con.

What other advice do I have?

TrueSight, as a solution, is a very large suite nowadays. In the last year or so, BMC has made the Orchestration module a part of the TrueSight portfolio. Then there are the Server Automation, Network Automation, and BladeLogic Client Automation pieces that are merged into the TrueSight portfolio. If you consider the entire TrueSight product suite, which includes TrueSight Operations Management, Infrastructure Management, and Application Performance Management, and you have TrueSight Capacity Optimization, TrueSight Orchestration, and TrueSight Automation — if you combine all these solutions you can see business innovation. You can automate a lot of mundane and repetitive tasks. You can automate a lot of administrative functions. You can integrate a lot of different components using Orchestration, and that helps reduce the human cost involved. And maybe you can use your human resources for more productive or more creative tasks, for things other than repetitive activities. So TrueSight can help businesses to innovate.

Overall, I would rate the solution at eight out of 10.

**Disclosure: My company has a business relationship with this vendor other than being a customer: Reseller.
More BMC TrueSight Operations Management reviews from users
Add a Comment
Guest