2014-06-08 06:32:00 UTC

When evaluating Server Monitoring, what aspect do you think is the most important to look for?

Let the community know what you think. Share your opinions now!

1010 Answers
Real UserTOP 20

There multiple angles that the consultant to look for on the Monitoring Per-se, let me list few.

1.Having separate tool to monitor Server/Network and so on .. is traditional method and this no more a value proposition .. look for a tool which can do a full stack monitoring of the environment. The reason for this is because this'll reduce the unnecessary integration efforts and chopped data due to multiple integration points. And this makes sure the data flow is seamless wherein it helps to manage environment from a single console.

2.The product selection should allow to extend to the AI based Methods as it going to create a huge impact in infra operations. And how complex it is to build is also a question but it always good to start as you don't need to be left out on the AI Ops race.

3.The product implementations should be completed based on the Docker/Container images which helps in scaling of the monitoring solution horizontally.

4.Strong Event Management should available to help in all event correlation and duplication.. so on.. It is considered to be obsolete in future but I believe it is going to be there for some time until the things gets matured in Deep and Machine learning algorithms.

5.Integration capabilities with third party systems(API,SNMP,TCP,Log)

6.Finally the cost plays a major role and see what you want in the environment. Product selection should be based upon to solve & proactively detect the issues in your environment and to add above values.

There are other pointer like ease of use,support,user experience .. so on which is must for any products...

Hope it helps!!

2018-07-31 06:20:13 UTC31 July 18
UserTOP 20

There are 4 things you should have in mind when looking for a monitoring system.

1. Do not take the articles that review and compare multiple monitoring systems too seriously. These articles usually focus too much on how many sensors a system delivers and too litle about what really matters.

2. Look more at the stuff that lives forever; how the monitoring system handles data.

- What capabilities does it have when it comes to dealing with dependencies?
- Does it store data in a way that makes it easy to implement AI?
- How well can it handle notifications?
- How scalable is it?
- How easy is it to implement custom sensors?
- Does it have any features that are useful that other monitoring systems does not have?

Bjørn Willy Stokkenes, the architect of Probeturion wrote an interesting article about these things on LinkedIn:

3. Do the vendor deliver proper support
- Do they answer quickly
- Do they understand your questions or do they make you send a lot of unrelated information about your settings and so on?
- Do they offer to support you in setting up your monitoring system?
- Do they offer to build custom sensors for you?

4. Do not get fooled by a low price. Remember, you and your workers time are worth a lot of money. Sometimes saving 90% cost in purchase of an IT system can make you loose 100 times more in wasted man-hours.

2018-07-31 07:57:40 UTC31 July 18

Security around protocols supported and what's not supported that relates to security, i.e. FIPS, etc.

What OSes and databases are supported; for capacity planning and clustering support.

What technologies can be monitored.

2018-07-31 12:20:37 UTC31 July 18

I think there are three things that should be considered along with the other comments here:

CONTEXT - what else connected to that server is being monitored? Diagnosing faults can be tricky and it's made much for difficult if you have to go from one monitoring tool for the server to (many?) others for all the devices connected to that server. A tool that shows that server in context with all the things it's connected to can make diagnosing network issues simple.

SELF-HEALING - half the time the tried-and-true power cycling of the device in question solves the problem. If the admin understands the system and knows that the server will occasionally require rebooting, why wake him up at 2am? The monitoring solution should be able to automatically execute self-healing actions like this based on preset conditions. This makes the difference between a 2AM call and a note in the admin's inbox when he gets in the next morning.

PROACTIVE ALERTS - if the user notices the network is down you're already losing money and gaining ill-will. A good monitoring tool will let you know when failures are about to happen and alert you before they start impacting your users.

And finally, as one additional last thought, it's nice to have a monitoring tool that will alert the entire IT team via something like Slack in case the admin in question is unable to respond in a timely manner.

2018-07-30 19:36:43 UTC30 July 18

Updated product (or one that continues to get regular updates), ease of use, and aesthetically pleasing.

2018-07-30 16:34:57 UTC30 July 18
Real User

IMO I like to engage the app/system/service owners and ask them what they want to see monitored. The experts are usually going to be those who built the service you are monitoring. Since an engineer is going to get the call at 2 AM when the alarm you set up trips, its important to work close with them also so you can iron out what is a good threshold for the warning and then alarm. Engage the NOC and see if there is 1st level support they can do to avoid that 2 AM call. I stick with a default base template constructed by the OS vendor's recommendations and then we tweak it to be more accurate for our environment. Server / OS monitoring is pretty standard across the board, I find its the application / service monitoring that takes a lot more thought. In the end the one question that usually wraps up the meeting. When do you want me to wake you up at 2 AM? What condition on the system warrants this call? When do you want me to send an automatic email for awareness? When do you want a ticket and email only? Every organization will have their own method for monitoring and it should be an ever growing and evolving process. Every outage should have an RCA and the monitors should be reviewed. Did we know this was coming? Could we have alerted sooner and avoided user impact? How should we monitor going forward.

2018-07-30 14:19:03 UTC30 July 18

Our servers are so different in terms of monitoring protocols! Some of them support SNMP, some SSH, some neither, so you need to install some kind of agent. And for all of them we need to monitor CPU Load, Memory Usage, Disk Usage, Bandwidth, Cloud Services, Web Page/Site Responsiveness, VoIP, SQL,SSH, FTP, HTTP/HTTPS... We tried several tools including the described ones. But finally we found CloudView NMS http://www.cloudviewnms.com which actually had the set of features we needed out of box. It is universal and combines both network and server monitoring.

2018-09-25 19:09:45 UTC25 September 18

1. Learning Curve. If low, various monitoring users can themselves build fine tune the monitoring, making you as less of bottle neck.
2. API integration capabilities, specifically with ticketing tool along with telegram or other such tool and for report generation.
3. Quality of Support for the tool. If there are issues how quickly can it be resolved. Since you are monitoring the environment your down time is supposed to be in minutes not in hours/days.
4. Automation capabilities. For devops automated provisioning and decommissioning and auto-correction (self-healing).

I wouldn't stress about proactive alerts as those are very very basic capabilities and should exist by default in any monitoring tool.

Would recommend Zabbix, Grafana, Ansible to get started with.

2018-07-31 04:30:17 UTC31 July 18

I think the most important are:
Processor Utilization
Memory Utilization
Disk Utilization

2018-07-31 02:28:16 UTC31 July 18
Real User

The most important is the trending and also allow multiple ways of alerting:

A. Paging
B. Case
C. Mail

2018-07-30 17:12:07 UTC30 July 18
Find out what your peers are saying about Zabbix, Nagios, Microsoft and others in Server Monitoring. Updated: September 2019.
365,533 professionals have used our research since 2012.
Sign Up with Email