What is our primary use case?
From a senior management perspective, they want to get an understanding, when there is an outage, what is the impact of that outage across the entire suite of the company's products. We have an Event Manager that integrates all of our monitoring tools. Since we are a large company, we have about 26 different monitoring tools in use. The idea is getting all of them into a framework which can feed such a model that displays the impact of an outage.
How has it helped my organization?
We have one application, which is fairly large. In the past, we had Level 1 and 2 NOC support teams who were responsible for watching dashboards. When they saw an issue in the application, they would call Level 2 or 3 support and escalate the call, if necessary. Now, through the use of this product, we have been able to reduce the headcount by five people, as we are able to eliminate the eyes on the glass. We no longer have people watching the dashboard. We have events which are processed automatically through the system and get to the right people. We had six people in L1s, and now have one. So, we reduced five out of six headcount, which is pretty significant.
Also, the average length of time used to be 45 minutes before we had the right engineer on the line, fixing the problem. Now, it's probably three to five minutes.
The solution affected our end user experience management very positively. Our application teams are very excited about what we're doing with the reduction in headcount. More importantly, the automation that it has brought to us has streamlined so many manual tests, The teams are very happy with the way things are going.
The solution will help us maintain the availability of our infrastructure across a hybrid or complex environment. Right now, we can get to an event scenario or problem quicker than we used to. We are right on the cusp of releasing our service impact modeling. This will help us tremendously because we have a multicloud, as well as an on-premise environment. Any component should show the impact across its applications, regardless of where it's located. It has definitely helped in these environments.
We have improved our ability to get to a root cause because of the way their tools work. If you follow it down to the lowest level of the diagram, and a problem happens, it lights up a certain model in red. However, if you go down to the lowest member of the tree, you'll see who is the lowest person. So, if it's a database saying, "I'm out of disk space," then it may create all types of chaos. Following that tree down, you'll see the lowest level is the database server, and it has an event disk space issue. Then, right there, that's the root cause of all your application issues. So, it has helped us get to the root cause more quickly.
We're just now gaining momentum on the adoption of this product. We have seen with a database out of disk space, because we can get to the root cause quicker, we know what the root cause is. It can be remediated faster, but we can also eliminate the number of people who have to be on outage calls. There is no need to have network people on a call if it's a database issue. We let them deal with other things, so our operation becomes more efficient. The database people know exactly what the problem is, and quickly.
What is most valuable?
The most valuable feature is the event management piece of it. We have it integrated with a number of our different products. Thus, we can create events into a single Event Manager, which will create a Remedy ticket for us. This is a huge feature for us.
We have 26 different monitoring tools. The way this product works it allows us to define a custom event call. We can take all of our monitoring tools, and say "If you can put an event into this specific format, then we have a way of creating a common event across all of our monitoring tools." By doing that, we have a single back-end process that acts on all of the events. So, we only do a data transformation upfront when we are receiving events. This simplifies our back-end.
The solution has helped to reveal underlying infrastructure issues affecting app performance. We constantly have network issues. The network team had been capturing them, but it wasn't integrated into any impact model. By integrating them into an impact model, we could now catch and see the impact of them to our applications.
What needs improvement?
It's a complex system. The implementation is fairly challenging. They have done a good job lately of getting videos out there. We would like more videos and self-training, though. Right now, you have to go to BMC's training classes to get a good understanding of the product, and those training classes are very expensive. While I understand they are a business and trying to make money, a lot of their competition has training available via YouTube. There is much more accessibility to competitors' training.
In a large company of our size, we need multiple people in our company trained. So, I have to take the training classes. Then, I have to go and train the rest of my organization. I would prefer to say to the other people on my team, "Go to this link and..." Or, "Here's a list of training sessions that you can go to which are online and that are free." I think it would help the adoption of their product in the marketplace, personally.
It's a far more complex technology than I perceived at the beginning to deploy. I would have thought that the integration between their products would have been more seamless than it has been. This is what has made it a lot more complex than I anticipated.
From a technical standpoint, some of their products still have a dependency on Oracle Databases, and they are very well integrated in the cloud for a lot of their components. There is another database technology called Postgres, which they are partially integrated with. However, if they were to get all of their platforms integrated into Postgres, it would be much less expensive for companies, such as mine, to go to high availability, etc. The architecture really needs to be upgraded. I know they're doing a lot of this, but they need to keep doing it, and accelerate their process, so they can remain competitive.
For how long have I used the solution?
We have been working with the product for the last year. We went live with the product in April.
What do I think about the stability of the solution?
Stability of the product is about a seven out of 10. As far as stability goes, it has mostly been very good. With some of the newer stuff on 11.3, we have to call to support a lot of times and get a patch sent to us because certain things just don't work. Those pieces would have hurt stability, but once you get it running, it's very good.
What do I think about the scalability of the solution?
The overall scalability of its platform and the ability to support its website is pretty good. We have a couple people on our team who seem like they are pretty proficient at it. They can do things rather quickly.
I don't use PATROL. It is pretty good if you use their native products, like PATROL for monitoring. We integrate other monitoring tools into TSOM, so we don't use PATROL. I am familiar with it though, and I have been trained on it. I feel like it's pretty labor-intensive to manage. For example, if I have a number of different classes of servers, there are a lot of screens that I have to fill out, deploy, and push out to my systems. There has to be a more efficient way to do this. My company is always pressuring us to be more scalable. It is not very scalable in the administration of its monitoring. It could be better.
For TrueSight Operations Manager, there are a limited number of people who use it, no more than 15 to 20 system administrators and support personnel, who are mostly in administrative functions. The reason that there are so few users utilizing the system is because all the events are automated. Most of our support teams and users look at Remedy, and there are over 3000 users looking at Remedy. So, a lot of the users of our overall system have no need to look at a TrueSight console. Their work is done through the way we have designed the system. They get a Remedy ticket and what's called a PagerDuty notification. They know when they get those two things that there's an issue along with all the information's contained within those two systems. They don't need to go to the TrueSight console.
How are customer service and technical support?
The technical support team is very good. I wish there where more the people. This includes the ones who I work with on the phone, as well as their field technical people. They are very good.
I don't know if their technical support differs from their project team, but we are constantly revolving people in and out of our project because they get different assignments within BMC. Thus, I wish there more technical support people who had more longevity on our account. We will have a CMDB person assigned to us on the project from BMC, but in just three weeks, we'll find out, "Oh, that person has been reassigned, and they have to go to another account, where they have to do something different." We are constantly having to retrain people coming in from BMC. So, there is no permanence with their people on our projects.
This issue of changing technical staff is not limited to BMC. However, their resource pool seems sort of small.
We are constantly facing issues with having to call support because things didn't work as we expected them to, and I don't know why that is. We use BMC Atrium CMDB product (service impact model) and publishing service impact models seems to be challenging and problematic. We are constantly calling support, who gives them a bug fix, which fixes the problem. However, those bugs shouldn't have existed in the first place. If there is a bug fix in it that somebody knows how to fix, it shouldn't have happened in the first place.
Which solution did I use previously and why did I switch?
BMC is one of our longest running partnerships. We have been using Remedy for many years. We have been using parts of this system since 1998. However, we have never put it altogether in the way that we're doing now. We didn't replace anybody else. We had used their products before, but not to their full advantage.
How was the initial setup?
It's a complex system. We were dealing with a highly customized Remedy system which caused us a lot of issues. We had to wait for a Remedy upgrade to occur before we could deploy our systems. We were at this for about a year, and most of that time was waiting to get the Remedy implementation in place. Once the Remedy implementation and upgrade were completed, there were a lot of challenges with our CMDB data and the integration of the CMDB to a service model along with the publishing of a service model.
We have Remedy, a service model, TrueSight Operations Manager, and TSIM. With a lot of technologies in play, making them all work together has been challenging, since each one of them is a fairly sophisticated technology. BMC could do something to make it easier.
It took about three months to deploy the core technology which solved our problem. We have been waiting a very long time on the Remedy upgrade, which was over a year. However, this was because our company had highly customized the prior Remedy version. Without that in the equation, the technology took us around three months to deploy.
We are still enhancing it. That time frame was just to get it deployed. To make the full use and benefit of it, that will take well over a year. Both the technology and the organization, who is using it, need to be matured.
Right now, four or five of our core products are monitored and feeding this environment. Because we've been successful at it, we anticipate integrating more of our products and the monitoring of those products into our system. We have already built the integrations for the different monitors. It is just getting the different teams to want to use this system. That's why it's an organizational maturity thing. We could take them on very quickly, but there has to be a willingness on their part to do so. Part of our strategy is to make them want to use this system. That's on the event side.
On the service impact side, we're working with senior management. This Friday, we have a demo with the CIO with this technology, because he is the one who is putting the pressure on the different application teams to onboard with us.
We have a multiyear onboarding strategy, where we're onboarding more applications and integrating them into this particular environment. Today, they are being monitored by their own support teams, who are now beginning to see the success that we are having. The challenge that we are having organizationally is, when we onboard their applications, we expose the issues of their products through Remedy tickets and outages. A lot of times, these teams want to hide that. So, we have political issues, as well as technological hurdles to deal with.
What about the implementation team?
We did a lot of it ourselves, since we had the knowledge in-house, specifically on the event management side. We were an ADDM environment. So, we had bits and pieces of technology knowledge in our company, but in order to pull it all together, we used Wipro, as well as BMC in India to drive this. That's still the case. We're still using them to get this whole thing deployed in various pieces.
Overall, our experience with BMC and Wipro has been positive. However, there have been challenges because the technical people have moved from our account to another account. We have a rotating team of people, which gets very challenging for continuity.
For deployment and maintenance of TrueSight, we need about four people. For the whole enterprise solution, we need 25 people for 24/7/365 support.
What was our ROI?
We have reduced headcount and shrunk the mean time to resolve. That's how we justify the expense of the product. It has really worked out.
It has helped us reduce IT Ops costs. We were able to replace the headcount of five out of six Level 1 technicians. We repurposed those people to higher level tasks. Without this solution, we would not have replaced that job function.
What's my experience with pricing, setup cost, and licensing?
We did a five-year, multimillion dollar deal.
We haven't licensed the solution's machine-learning and analytics to deploy artificial intelligence for IT ops.
Which other solutions did I evaluate?
We looked at some of their competitors, but because of the technology and the base of knowledge that we already had in place, it made sense to stick with BMC. We decided to focus on making their products integrate the way they were supposed to and were designed to. That's what we've been doing since we had the knowledge and license in-house.
We also evaluated ServiceNow and BigPanda.
On the pros side, BMC and ServiceNow were very similar products. My biggest concern with BMC was they seemed to be declining in market penetration versus ServiceNow, which has been expanding considerably over the last several years. That was my biggest concern with moving forward with BMC. The pros with BMC were that we already had the knowledge in-house and the technology was proven for us. We knew it was fairly solid, so we felt confident that we would be successful with it.
One of the other differences between the two companies is the marketing organization from ServiceNow was a lot more consistent than from BMC. We probably get more calls even today from the ServiceNow account rep than we do from our BMC team. They show up every once in a while, and they do a big dog and pony show, then they go away for a bit. So, I don't think their marketing is as strong as it should be, or we're not a big enough customer for them. However, with the amount of investment we have in their product, they should be around more often.
There is one piece of BMC technology that we decided not to use. That's their Atrium Orchestrator. We use a different third-party orchestrator called Ayehu. We just found the Atrium Orchestrator from BMC to be too complex.
What other advice do I have?
Make sure you have knowledgeable people on your staff. Give yourself plenty of time for deployment, if you think it will take three months, make it six months. Look at past companies' experience on time to deploy, knowledge, and staffing requirements.
The solution's event management capabilities are very good. In some ways, they are based on very old technology. I first started using it way back in the late nineties and the basic core of the product does not appear to have changed much since then. Back then, it was a very good product. So that's not necessarily a bad thing. The other things that the company has done since then. Its enhanced the website portal, which I have a very positive impression of.
The website is fairly new, and it could be a little bit better. However, if I were to compare it to some of the other tools out there, it has a much nicer GUI and presentation. The web presentation is much more advanced than BMC's TSOM server.
We still have multiple panes of glass. E.g., we have an Event Manager screen along with a Remedy screen. We're getting closer to a single pane of glass and have fewer panes of glass. Where we had a lot of dashboards before, we now don't have anything, as we've replaced all of them. So, there are no panes of glass in our support. So, if you are a support personnel at our company, you are not looking at a screen. Instead you are looking at your cell phone, because we reach out to you when there's a problem and you don't have to look at anything.
We are using about five percent of our environment. We have what is called a limited deployment right now, because we have so much integration and automation going on. We needed to mature the support teams and the rest of the organization as a whole in what we're doing. Once we have achieved that, I anticipate a 100 percent of our applications are going to be feeding this system. After that, we will greatly extend our use.