How has it helped my organization?
ITRS helps us to identify a lot of production issues proactively. E.g., ITRS alerts for a critical process memory that grows beyond the configured threshold limits. The limits are set to alert before the process crashes. This allows support teams to recover the process in a controlled manner (usually a planned restart before the process crashes).
A State of the World dashboard that we built gives a bird’s eye view of the entire production environment. These dashboards are displayed on big screens that are monitored by support teams in multiple locations. This helps us to monitor mission-critical components more effectively. FKM plugins can monitor huge logs in real-time for errors/exceptions and other keywords. Alerts are triggered immediately when an exception is logged.
As far as I know, ITRS does not have integration with any version control system. ITRS monitoring configuration is defined in a setup xml which gateway reads on start up (setup.xml). When you are working on a major monitoring set up, you will have to make a lot of changes in the xml. ITRS keeps only last 10 versions of the changes. If I want to restore an older version then it will not be possible unless you are taking regular backup of the set-up xml. Integration with a SVN will allow every major change in ITRS setup xml to be checked-in. If i want to restore any version of the xml (a month old), I can do that from the SVN.
I came across an incident when the setup xml was mysteriously wipe off from the disk. So we had to restore last working xml from a SVN that was manually maintained outside ITRS setup. If ITRS can include an optional SVN configuration (like DB logging), it will be very useful.
What is most valuable?
ITRS can define rules to alert when certain parameters that you monitor breach a threshold. Rules can be configured to fire recovery actions automatically to clear the alert. However, if the alert persists for an extended period, then it has the capability to fire multiple levels of escalation email or actions.
It also has powerful visualization features to develop dashboards to display in big screens and share it via Webslinger (a web server that allows dashboards to be accessed via web browsers). It can log metrics (values) into a reporting database and generate historical charts for trend analysis. The Toolkit plugin can run scripts for custom monitoring requirements that cannot be implemented using standard plugins. There are features to suppress alerts for time-period - Snooze, as ITRS call it - or change the value/severity of the parameter.
There are a number of things that you can do directly from ITRS active console without logging into the server. Processes can be restarted and you can open log files directly from active console. It also has a built-in scheduler that can run tasks (commands that you configure) on multiple targets. The active time feature enables you to apply different rules at different times. It also helps to disable alerts during non-business hours (eliminates noise).
What needs improvement?
ITRS has setup XML that holds the entire monitoring configuration. Only the last 10 versions of the setup XML is saved locally on the gateway server. These 10 versions get overwritten quite easily when you are working with a big configuration change. I would like to see ITRS integrate its setup editor with a SVN to check-in setup XML after major changes.
What do I think about the stability of the solution?
I have not observed any stability issues so far. ITRS has a transparent failover mechanism. If the primary gateway fails, the secondary gateway takes over. I have seen the active console freeze while it’s failing over, but everything recovers within a few minutes.
What do I think about the scalability of the solution?
I have not observed any scalability issues in an environment with 60+ servers, 1000+ processes, 5000+ logs to monitor.
How are customer service and technical support?
I am very satisfied with the level of technical support I get from ITRS. All the queries that I raised around monitoring setup and configuration issues were closed in a timely manner. We also have a ITRS technical support specialist visiting our office twice a week. This is very helpful to discuss some of the complicated monitoring solutions that we wanted to implement in person.
Which solution did I use previously and why did I switch?
I have used Wily Introscope and Nagios, but they are not as comprehensive as ITRS.
How was the initial setup?
Initial setup complexity depends what you want to implement (e.g., building a dashboard is complicated). You will need a basic training session to start working with the initial setup. Depending on what you want to achieve, the rules and actions can become a bit complicated.
What about the implementation team?
We have 20+ applications monitored in ITRS and the implementation was done in-house. ITRS has features to import template setup. This will save a lot of effort on the initial setup. If you are starting from scratch, create a setup template which can be re-used on the new gateways that you create. (E.g., default rules/samplers, etc. can be defined in a template and imported.)
What's my experience with pricing, setup cost, and licensing?
I don’t have much visibility to the pricing, as this is negotiated at enterprise level. I heard that enterprise-level licensing is quite expensive.
What other advice do I have?
I would definitely recommend this product for monitoring mission-critical applications.
ITRS is the best monitoring tool I have used so far. It comes loaded with a lot of built-in plugins to monitor almost all of the parameters that you want to monitor in a production environment – i.e., processes, memory, disk space, CPU, keywords in log files, daily feed file generation, web service monitoring, MQ, Database, FIX Sessions, etc.
ITRS ActiveConsole provides a powerful interface for monitoring the environment. The gateway setup editor makes it easy to work with the monitoring configuration.