What is our primary use case?
It is for application monitoring. So, we are using it for basic infrastructure, process up/down and log file monitoring to create metrics, and alerts. Then, we extended the use to cover creating more of a system management console, so we could stop and start a process from the console without having to go into the box to do the change.
Across the bank, every team needs to provide a Ready For Business status, which is shown in a centralised web page for ease of viewing. As the bank is very large, it is down to the application teams as to what tool they choose to use, and in this instance individuals would manually update the webpage status for their area. We used Geneos to start automating that process, which helps provide a holistic health across all applications, and due to the nature of Geneos, also enables downstream applications to understand the health of their upstream providers.
Geneos is an agent that runs on the host. The agent collects the information, then provides it through to a console. The analysts can then see all the information coming through on the console.
How has it helped my organization?
We started linking applications together. As a large company with a lot of individual support teams, most teams will support their own application. However, if you think of it from a business flow point of view, the business requires those applications to work together for the business to actually work. We have been able to link the applications together and can see the health of applications, which are two to more steps removed from where we are. During an incident, people will first start looking at their own application, then realize there is nothing wrong with the application. So, they go and ask the upstream app team. Now, they can see if the app three steps before them has a problem, which is probably the reason why my app is not working as expected.
The solution is used across the entire investment banking division, covering environments such as electronic trading, algo-trading, fixed income, FX, etc. It monitors that environment and enables a bank to significantly reduce down time. Although hard to measure, since implementation, we have probably seen some increased stability because of it and we have definitely seen teams a lot more aware of their environment. Consequently, we can be more proactive in challenging and improving previously undetected weaknesses. For example, we recently started to use it for managing certificates when we were having issues with certificate expiry to validate that certificates were not due to expire, or had been correctly refreshed and as a result significantly reduced the certificate failures in this space by about 70 percent over a period of 12 months. This improvement was predominantly down to the visibility Geneos provided. Due to the certain standard configs leveraged with Geneos this enabled us to be very quick and nimble as we could just create the required scripts and push them out to all the Geneos instances to be deployed easily. So, all the different teams could leverage this capability with a high level of reuse.
In my previous role, I used to use Geneos for market data. It plugs into the Thomson Reuters platform and was very good on the market data. Geneos provides lightweight data collection that sits on the host, which we run on time-critical servers and it doesn't have any performance impacts when doing electronic trading.
If a server CPU is at 90 percent, you can get that alert within seconds with any monitoring system. Therefore, what is more relevant is:
- How do you manage those alerts?
- How you consolidated those alerts so you are getting relevant information.
With Geneos, you can alert on certain thresholds, so there can be warnings, or if you have an action on it, then it can then bump it up to the top.
We are using it to set thresholds, so we can see what is occurring and intervene beforehand.
Before Geneos, we didn't really have an effective way of managing alerts.
What is most valuable?
The flexibility of the console is probably the biggest value. It is the ease in which you can pull the data together onto a screen. You can pivot the screen to however you choose to look at it. So, you can take a simple approach, and it can show business flow. Then, you can give it to a manager or business user who can see their flow and it quickly helps with the flow. Therefore, you create more of a technical view and look at more of the environment through a construction or routine lens.
The second biggest value is the ease of being able to configure and modify alerts to better manage them.
The third biggest value is that you can automate responses, so you can get it to run scripts. You can invoke a script automatically based on an event or can trigger manually if you want to carefully manage the situation. We also integrated it into ServiceNow, so if there is an event on the console, then we automatically generate a ticket, so there is an audit trail. The added benefit of ServiceNow integration is that you can leverage the on-call functionality to provide responses out of hours.
What needs improvement?
Mobile phone integration is probably not as rich as it could be.
Another area where I would like to see some improvement is around visualising the environment. At the moment, drawing the estate within Geneos is a very manual process, so it would be better if there was a reusable database behind it that can link the environment to the configuration. For example, read a CMDB to provide the view of how it works together. Or, if not feasible to read the CMDB, put the effort into creating your diagram and generating a CMDB from it. This would be very valuable because App teams have to pull stuff together, to show where host A is in relationship to host B, and at the moment this is a lot of manual effort with very little reusability.
For how long have I used the solution?
I have been using Geneos pretty much since I have been at the bank. I have been at the bank for eight years and used it in two roles. When I joined the bank, I headed up market data and the bank was already using Geneos. So, it was already in place when I joined. Then, I changed roles and moved into applications support for the front office, where I introduced Geneos and helped create an enterprise deal enabling it to be rolled out across all areas of the front office.
What do I think about the stability of the solution?
It is stable.
There is very little maintenance for Geneos itself. Sometimes, you have to upgrade it. Effort is more about building standards and improving the capability of the solution.
What do I think about the scalability of the solution?
It scales relatively well.
About 8,000 hosts are covered by it.
User roles are predominantly application support.
How are customer service and technical support?
The technical support is good. I haven't encountered them directly, but I know some guys who have and they have been relatively responsive.
I normally deal with the account team. I have had a number of sessions with them, which have been quite good. Where there have been gaps in the product, they have taken that feedback onboard, then tried to enhance the product.
Which solution did I use previously and why did I switch?
Before Geneos, we didn't really have an effective way of managing alerts.
When I joined in market data, it was being used within market data. Then, I moved into investment banking. Since it was not being used in investment banking, so I took the product into the investment banking area.
How was the initial setup?
There are different ways of doing it. I think installing Geneos itself is relatively straightforward. When you use Geneos to scale, then you can do it one of two ways:
- You can give each team Geneos, and they can do it themselves. That is not ideal because they end up setting it up in slightly different ways.
- You can try to have a central team engineer it, which is better, but obviously it takes longer to do that.
If you said, "I have a small estate that I want to get monitored," then getting in and instrumenting your estate from zero to having it done can be done in a relatively short period of time.
When I rolled this out to my area, I just gave it to individual teams, as I felt we were behind where we needed to be from a monitoring perspective. I just said, "Look, get the product out there. Start using it. Let's get some value out of it. Then, at some point in the future, we will work out how we converge onto some standards." Which is what we did.
Another team, who came a little bit later, saw what we had done and the benefits we were getting, but had the benefit of having some central engineering team, took the time to engineer it and have a standard, then they pushed that standard across. Although this took longer to deploy, there are benefits because they can now do things quicker with that standard. My area then started converging onto that standard, but we had to kind of do almost a double build.
All things considered, if I went through the same thing again, I would still probably do it the way I did it because we started getting value out of the product straight away, which was critical for me due to the immaturity of our monitoring, rather than waiting to build a consistent approach, then pushing out.
For each team in each area, it probably took about two months to start getting them from zero to having it deployed, then getting value back on it. Some of them were quicker than that, if they had previous experience with it. For the teams that had zero experience, it probably took about two months. More sophisticated monitoring and automation takes a bit longer and if you are looking to instrument your whole environment, especially if you're doing it without any incremental / dedicated resources, then you are probably looking at a couple of years.
What about the implementation team?
We didn't have any incremental bodies to go and do this. The existing production support team did it themselves. There was no additional funding to go and build this capability out. It was just the existing guys during their day job. I was like, "This will help you. Get this done, and start using it."
What was our ROI?
We have seen ROI because the environment is far better managed.
What's my experience with pricing, setup cost, and licensing?
When I first came in, their pricing was very high. ITRS had a high expectation of what their price should be based on perceived value. I think they have been realizing, more recently, that there are other competitors, so their pricing is a lot better. Licensing for on-premise is okay, however I feel there is quite some work to be done for cloud and containers. We're still working with them to try and work out what that pricing should look like.
In terms of value, you have to negotiate with them to get a good deal for the product, but that is no different to any other vendor. I think if you don't negotiate, then you will end up paying a relatively higher price for it. If you negotiate, you can get a lot better deal.
We have a tiered pricing model with an ELA. Every other year, we agree on what the pricing is. We work out how many licenses we are using. It is all predefined, because when we started the contract, we agreed the rate card and made sure that price increases were RPI type price increases. I feel that is a good model, as previously we didn't have an ELA, we had loads of individual contracts and everyone was paying a different price. The pricing wasn't that competitive nor that great, but we spent some time putting all those contracts together to get a global pricing.
There are some optional add-ons, if you want them.
Which other solutions did I evaluate?
We looked at some other tools, predominantly AppDynamics. It comes in a slightly different perspective. It is aimed more at performance monitoring. It was a lot harder to derive the value out of it, than what we have done from Geneos. Geneos was an easier tool for the teams to get used to, on-board, and immediately get value out of it. AppDynamics was one of those things where you have to spend an awful lot of time before you can get value out of it. It is also more suited for certain applications than others, where Geneos is a bit more generic and can probably work in most spaces.
We were also evaluating some home grown solutions, which were lower cost solutions. In my opinion, Geneos wins against homegrown solutions, as it has been around for a number of years and a lot of people have fed into the ideas. So, it has evolved due to feedback from various clients, because there is a dedicated team behind Geneos product. Whereas, if you think about home grown solutions, they are limited by your experience and rarely mature as funding ultimately becomes an issue, so end up not as function-rich as Geneos.
If you look at some competitors, such AppDynamics, they probably have a better way of discovering dependencies as well as connectivity to them. That aspect is probably another area for Geneos to improve on.
What other advice do I have?
Determine the scale you need. If you do want to go enterprise-wide, it probably is worthwhile standardizing on the design. However, if you're already a small shop and bleeding (have no effective monitoring in place), then just get out of there as quickly as possible and think about standardizing afterwards. Think about what you want from the product. The product is very capable, so you can just use it for monitoring, but you can get a lot more value out of it by sharing with a business to demonstrate business flows and picture it in that dimension.
Definitely consider the automation or scripting capabilities of the product, which are very powerful. This avoids you having to jump onto boxes and run commands yourself. You can script them, which means you avoid people making mistakes or human errors.
The solution’s web-based UI is functional. It is not as rich and as powerful as a console, but it gives managers in business a high-level view of the environment. An analyst or support person is probably better off with a console rather than a web-based view.
We haven't really played with the application performance monitoring too much. I believe the stuff they have come out with will help us start seeing trends over time and be better improved in them. However, I haven't really tested that part out.
We are not using it for predictive analysis.
I would rate this solution as an eight out of 10.
Which deployment model are you using for this solution?