Disclaimer: The writer is the VP of Products at indeni
SSH vs. SNMP
The majority of today's standard network monitoring tools are based on SNMP, which makes them “generalists” – able to offer a broad coverage of almost the entire network infrastructure, a big advantage. The downside, however, is the inability to penetrate beyond the surface of the devices being monitored, especially in terms of data collection and insight.
The reliance on limited protocols such as SNMP is the main restrictive factor of today's standard network monitoring tools. While many firewalls and routers support SNMP, they provide only a small subset of their running configurations through the protocol.
Additionally, the nature of SNMP is to be reactive: only after a certain event occurs does the protocol flag the issue. In most cases, the monitoring system needs to “know” what OID to look for and in which specific MIB.
By contrast, more robust protocols such as SSH allow to access and extract an entire range of configuration data in network devices and in real time. SSH is also commonly used by engineers to troubleshoot devices. In this way, a tool that utilizes SSH may simulate human-like behavior, so that the data it's capable of reaching and analyzing in the network go far beyond what standard, SNMP based network monitoring tools can reach.
So essentially, using SSH for monitoring a network is like monitoring in reverse - Instead of looking for symptoms (which are the end result of a problem), they look for the possible causes (the beginning of a problem).
Knowledge Pre-Loaded in the System
Out-of-the-box, most standard monitoring tools have to acquire information. This means the system must be “taught” to query the devices on the network and the types of potential errors and fail-points to isolate. This is a time and resource intensive process and it always impacts an IT's team's efficiency. We need to make extensive time investment in researching what is best to teach the system and how to go about enabling that process.
Anyone coming from an OPS background can relate to this feeling - It always seems as if end users ”outwit” the monitoring system, alerting about an application level issue (email, CRM, etc.) before any monitoring tool does. This, "total failure", usually happens when we set a certain threshold for high CPU or low memory. However, there are many other events that took place before the threshold was met, events that could have been noticed and solved beforehand.
In the illustration above, there was a single issue which caused a constant increase in CPU utilization, eventually causing the device to "misbehave". This whole scenario could have been avoided if the debug flags had been turned off again.
Knowledge is Power
No matter how talented your team is, it will never be able to cover all the “known” issues found in thousands of different network setups across the globe. In addition, networks just keep on growing in both size and complexity, while IT teams only get (relatively) smaller over time. The secret to winning this ongoing battle between network size and the size of the teams managing these networks is through automation and the utilization of the “social” power, sharing the knowledge between the users across the globe.
From a quick study that we put together, it is apparent that the average networking team adds 1-2 checks/updates over the course of 3 months. This "low number" is the result of various limitations, the main one being – time! There is never enough of it. There are always fires to put out, scheduled upgrades, meetings, users…elements familiar to everyone. Having a tool that will adjust and grow with you and your network is the way forward and a means of "filling" that time gap.
Visit our website - www.indeni.com