BPPM Implementation Considerations
Part 1: Meet your business requirements
Three years after BMC ProactiveNet Performance Management (BPPM) is
released, now most BPPM customers reached a conclusion that BPPM
implementation is more than just software installation. But what make a
BPPM implementation a successful one? What do you need to consider
before diving into installation details?
"BPPM Implementation Consideration" blog series will try to address several important considerations at requirement level and architecture level. Implementing BPPM is a lot like building a house. Many considerations at requirement level and architecture level are like the foundation of the house. They need to be determined at the very beginning.
The most important consideration in BPPM implementation is your business requirements. The management of your organization, your entire implementation team, and other stakeholders should have a clear understanding on a list of business requirements that your BPPM implementation is expected to meet. Then you will need to translate this list of business requirements into a list of technical requirements with a category assignment such as mandatory, strategic, cost-saver, and nice-to-have.
Only now you can map each technical requirement into a list of detailed BPPM features and prioritize the implementation of each feature. This will become your project scope. Based on your project scope, you can plan your project timeline and budget. If you outsource your BPPM implementation to a consulting company, it is critical that you do your homework on your business requirements and technical requirements first. Then work closely with the architect (not just the project manager) of the consulting company to determine the project scope.
However many new BPPM customers I have talked to seem to do it backwards. They came up with a budget first without knowing exactly what BPPM features to implement and how long the implementation will take. Then they picked up a list of BPPM features to implement from product datasheet without knowing how each feature relates to their business bottom line.
As an example, here is the process taken at one of my past clients. One of the top business requirements was to cut down the cost on Remedy Gateway licenses from multiple monitoring software vendors. This was translated into a technical requirement like this: Alerts from multiple monitoring software must be integrated into one alert management tool to communicate with Remedy for ticket creation. This requirement was categorized as cost-saver. This technical requirement was mapped into these BPPM features: Event to BPPM cell integration through API and SNMP traps, msend API installation, SNMP trap adapter high-availability implementation, custom BPPM cell MRL rules to process events from multiple vendors, IBRSD high-availability implementation, and event to ticket categorization in BPPM cell. The return was a 6-figure annual license saving year after year with an investment of 5-figure consulting fee. This ROI went straight to help business bottom line.
Part 2: Keep the total cost of ownership in mind
When you build a house for yourself, you don't just consider the cost of
building, you also consider the cost of maintaining the house and
utility bills when you live there. Similarly when you implement BPPM,
in addition to implementation cost, you also need to keep the total cost
of ownership in mind.
After talking to several BPPM customers, I noticed that they all have at least twice the size of the operations team comparing to the team at my clients just to keep BPPM operations going. What is worse is that their operations team also need to have the implementation skill set to constantly patch up the implementation.
Before you even start implementation, consider the following aspects:
1) Scalability: When your environment grows with more servers, more applications, or more integration, will your architecture still work? How easy would it be to split horizontally (based on processing steps) and vertically (based on incoming traffic)?
2) Upgrade: What can you do right now to make future upgrade easier? You may want to consider having a name convention, saving configuration in a separate repository, and documenting everything consistently.
3) High Availability: High availability not only helps with business continuity, it also helps your team from constantly fighting fire. You have several options in high availability: Application level failover, OS based failover, active/active load balance, or duplication. Which option would best fit your needs for each BPPM component and how much would it cost? For example, a native application level failover might be your best choice for BPPM cells if your business cannot afford to miss a server down alert. But a simple duplication of PATROL 7 console is probably sufficient for you comparing to OS based failover which would cost nearly twice as much.
4) Implementation Repeatability: Do you keep an accurate implementation document so that installation and configuration of each BPPM component is repeatable? You need to implement everything on a test system first and carefully document everything as you go. Production deployment should be a straightforward 'follow the doc' process. It also gives you a perfect opportunity to update the implementation document for anything you have missed.
A common mistake I have seen is to start the implementation directly on a production system. After several months of figuring things out, it finally went live with many junk files sitting under the implementation directory. Then you realized that you actually needed a test system because you won't be able to make and test changes otherwise. Now you don't know how to configure your test system to make it identical to your production system since you have lost track on what made the production system work and what did not.
5) Operations Standardization: Do you have a standard operations procedure document? For example, if a new server is added into your PeopleSoft Payroll application, do you have a document containing the steps for the operations team to add that server to PATROL, BPPM integration service, BPPM cell, BPPM server, BPPM GUI, and automated Remedy ticketing?
Part 3: Achieve the highest ROI through integration
In addition to monitoring solutions from BMC, most enterprises nowadays
also use monitoring software from other vendors, open source, and even
home-grown scripts scheduled by cron job. Having a group of NOC
operators watching the GUIs of all monitoring software in a NASA-like
environment is simply not efficient. What is worse is when you have to
pay the license fee for each monitoring software to connect with the
back-end ticketing system.
BPPM/BEM cell provides extremely flexible and robust API and adapters to integrate with just about any monitoring software out there. Whether you are running monitoring tools from other commercial vendors such as IBM and Microsoft, or you use open source tools like Nagios, it is fairly straight forward to integrate alerts from these tools into BPPM/BEM cell using either its OS API or SNMP adapter. If you use home-grown scripts, all you need to do is to add an API call at the end.
If your back-end ticketing system is Remedy, the out-of-box 2-way integration (IBRSD) between BPPM/BEM cell and Remedy is more efficient than Remedy gateways for other monitoring tools. It is fairly straight forward to configure two instances of IBRSD as active/active failover, so your chance of waking up at 3am to fight fire is very slim. Since the license of IBRSD is included in the price of BPPM/BEM, you instantly cut down the cost when you stop paying for the Remedy gateway license for other monitoring tools.
Other added benefits include reduced maintenance effort for other monitoring software, less customization in Remedy, consistent ticket information for all monitoring tools, and possible event correlation between events from different monitoring tools. You will also make your NOC team's job easier.
I understand that it is not always easy to convince people who work on other monitoring software to integrate into BPPM/BEM due to organizational silo and technical complexity. It is important to pick up the right candidate for the first BPPM/BEM integration. Once the ROI is obvious, people will become more supportive for BPPM/BEM integration. In addition, it is also important to set up a consistent framework for all integration since BMC does not provide a standard for integration. Once you have set up a consistent framework for one-way and two-way integration, your next integration will become much easier.
At one of my past clients, it took our BPPM/BEM team three months to work with the other team to finish our first integration because the integration project had the lowest priority with the other team. Once everyone saw how well the integration worked and how much license fee it saved, our second integration took only 4 weeks to finish. Subsequently our third integration took only three days to finish.
Part 4: Monitor the monitors
The purpose of BPPM is to monitor your IT infrastructure. It is
important that the monitors themselves are up and running all the time.
A good BPPM implementation not just monitors your IT infrastructure, it also monitors each and every BPPM component including BPPM server, BPPM agent, BPPM cell, PATROL agent, PATROL adapter service/process, SNMP adapter service/process, IIWS service/process, IBRSD service/process, ..., etc. The self-monitoring metrics include component status and connection status.
The events alerting that a BPPM component down or a BPPM connection down are mostly sent to its connected BPPM cell automatically. Some of the self-monitoring events require quick activation. You need to identify those events as they have different event classes and message formats. And you need to notify the right people about those events.
Some components may have multiple ways to be monitored and you just need to pick up one way that works the best in your environment. For example, when a PATROL agent lost its connection with PATROL Integration Service, you can see an event directly sent from PATROL agent, another event from PATROL LOG KM if you configured it to monitor IS connection down log entry, and yet a third event from PATROL Integration Service if you activated it in BPPM GUI.
You may need to reword the message of a self-monitoring event for better readability as some messages are not clear at all. For example, by default, PATROL agent connection down event contains the following slots:
msg='Monitored Cell is no longer responding';
You may want to reword the message to look like this:
msg='PatrolAgent@firstname.lastname@example.org:3181 is no longer responding';
because it is the PATROL agent that is no longer responding, not the cell.
For the notification method, the most reliable way is local email fired from the cell that receives the self-monitoring events. Since your path to the ticketing system may be down when your BPPM components are experiencing problems, your back-end ticking system should not be the only way to send notification for your self-monitoring alerts. It should be used in addition to your local email notification.
Part 5: Customize at the right place
Unless you are a very small business, you will need to customize BMC
out-of-box solutions to address the particular issues in your IT
environment. It is unrealistic to expect a one-size-fits-all solution
from BMC. Fortunately BPPM was developed with customization in mind. It
provides extensive tools to help you develop your own solutions that
seamlessly extend BMC out-of-box solutions.
BPPM suite has three major components: BMC ProactiveNet, BPPM Cell (BEM), and PATROL. Both BPPM Cell and PATROL are more than 10 years old. One of the primary reasons that they are still going strong today is because they both allow you to add your own solutions to them seamlessly.
Before you start developing your own custom solutions, take a step back to think about what options you have and where you should place your customization. What would be the impact on accessibility and resource consumption on the underline servers? What would be the impact on deployment of your custom solutions? What would be the impact on future maintenance and upgrade?
In PATROL, you can develop custom knowledge modules and you can also plug in your own PSL code as a recovery action into a parameter. In BPPM Cell, you can develop your own event classes, MRL code, dynamic tables, and action scripts to extend the out-of-box knowledge base.
In general, if you have a choice between customizing PATROL and customizing BPPM Cell to manage events, customizing BPPM Cell would require less effort and result in less impact to the servers that are being monitored. Here are a few reasons:
1) PATROL is running on the servers you don't own, have limited access, and may not be familiar with. For example, I was recently helping a client debug a custom KM running on AS400. I had to get help from AS400 sysadmin just to add one line in its PSL code.
2) PATROL is often sharing the server with mission critical applications. Poorly written PSL code could potentially impact the mission critical applications negatively.
3) The same custom knowledge module may need to be running on more than one server, thus requiring more time to deploy and upgrade.
4) BPPM Cell is running on your own infrastructure server. It is infinitely scalable as a peer-to-peer architecture. If resource has ever become an issue, you can add more cells either on the same server or on a different server (even with different operating system). you can split a cell horizontally by processing phases, or you can split a cell vertically by event sources.