What are some must-have features or elements to beware of?
We implemented up.time as our multi-platform monitoring solution. Up.time by far is the easiest to implement, maintain, and little to no training is required to be up and running in literally one day.
To expand on some of the ideas above what I always am particularly interested in is how easy is it to add in our own custom scripts, and dashboards. Every product has pluses and minuses but what sells me on a product is the ability to get it to give me what I need for my environment, hopefully with the base product but sometimes with the ability to work with the vendor for a custom solution.
Ok, here are specific things I would ask about a vendor's product.
Does the product really support multiple OS platforms. The test for this
is easy. Does their main monitor station run on different OS choices. Do
they offer the same level of support on all platforms, or does one OS have
an agent and the others are "just snmp".
Does the product support multiple database choices? Can you choose between
Oracle, MariaDB, MySQL, SQLserver or others? You should have the choice of
what fits in your environment.
Is the product reasonably browser neutral? Try it from IE and Firefox and
Chrome or whatever is used in your environment. The main browsers should
Is the database accessible from an external request? Can your team write
their own custom SQL requests, or better yet is there a documented API for
accessing the database?
Does the product support accessing agentless devices such as temperature,
humidity, or wetness environmental monitors via snmp?
Does the product support monitoring webpages, including testing for text
returned and also for the time to access the page? Can that data be saved
for graphing later?
Is security real and important to the vendor. If the product supports
logins, then login as a non-privileged user and click on every button you
see, and search in every search box that is available. Can you see things
you shouldn't, or find servers that should be protected?
How many ports are required to support servers and equipment through a
firewall. I can't think of a reason that the answer should be more than
Is the connection from the monitoring station to the servers secure? Does
it use XINETD with a security configuration, or at least password
protection between monitor and server?
Can alerts be easily shared between notification lists so modular alert
groups can created and shared?
Is it easy to clone an existing alert or do you have to start "from
scratch" every time?
Can alert information be easily shared with external scripts? Can all of
the parts on alert be handed off to a perl script, for instance.
Is the product team centrally located? Is doesn't matter where they are,
but if a tech support wants to talk to a developer can they just "walk
over" to the developer or do they have to send an email off to someone?
Is support focused on your needs, or do they seem tied to their own metrics
(like ITIL)? Hint: If they want to "close this ticket and open a new one",
that is not a good sign. If the support folks are forced to be more
concerned with their metrics than your problem, you probably won't be a
So, these were my concerns. I ended up choosing Uptime, out of Toronto
Canada. We've been very happy with it.
One other thing to consider in light of the Target data breach: what is the security track record of this system. Monitoring requires very low level access to physical, OS and application infrastructure. One of the doorways that was used in the Target breach, was a common password amongst the various runtime elements of their monitoring app (not naming it as I don't have direct experience with it but it is discussed in various articles on the breach).
This is one of the big "step ups" that you have to make when moving from more simple monitoring tools to ones that are more integrated.
So, what's a 'mid market solution', anyway? The answers are all over the place; it could mean the classic 200-1000 seat network on one end, all the way to medium size Enterprises. But, having worked at (and consulted for) organizations both bigger and smaller I'll take a stab a it:
Don't sweat the easy stuff. Just about everyone can do a ping analysis and send an alert if a system goes down. What's harder is monitoring applications - ensuring that services are up, files are available, and response time is adequate on key applications. Yes, it is entirely possible to have services down while the server is running - I've even had servers that would respond to ping and nothing else; just a reptilian hind brain left while the cognitive functions are gone. I saw 2 last week, but my current employer has a large network.
It has to present clear information in an actionable form. Some call it 'collapsable alerts', where you can identify dependencies; losing a router to a remote office won't trigger alerts from every device there because your rules can be configured to send ONE alert on that failure, not a thousand different devices. By the way, you should look hard at visual presentation - maps are nice as are graphs, fuel gauges and virtual dials. Don't pick them all, but it's nice to have a choice.
Make sure you can customize the view for the role. The CEO might want to look at overall metrics but don't throw all the details at him. For that matter your network crew might not want every server in creation on their default page, and the systems engineers won't want router details there either - but both groups will want that information somewhere.
Finally, consider upgradability. Vendors release updates every 6-12 months, and there are substantial benefits to getting that update sooner rather than later. It can be a challenge sometimes, so talk with current clients about how hard the upgrade has been the last few years. Time well spent.
Being an employee of Fluke Networks and supporting the TruView solution - I tend to be biased and agree with the above statements. Gartner's latest research in the NPMD (Network Performance Monitoring/Diagnostics) or as we at Fluke like to call it Application Aware Network Performance Monitoring (AANPM) - can offer some ground work on the top players currently in the industry with some great combined APM and NPM combined solutions. The research and the full report can be found here: http://www.flukenetworks.com/content/gartner It might offer you a good place to start.
Hope this helps, and of course feel free to reach out if you have any questions related to Fluke products.
In addition to the previous responses. What is the demand of your customer/organization? Are users complaining about high response times and you want to get grip and control on the issues? And does grip and control means that you want to identify the root cause? Or is the demand that you need to report about the technical availability of the systems and is your end-user not that important.........
Wow, that's an open ended question. The 3 previous responses are valid. My feedback is look for a "complete" solution which will draw your road map as an 'enterprise solution' and not create silos. Whichever product you evaluate, ensure it is expandable enterprise wide and has longevity as well as wider platform footprint. It probably might take you a step above a mid-market product but in the long run it would be advantageous to your company.
I'd be more than happy to discuss with you one-on-one further if you like. Feel free to reach out to me directly. Been in this industry for over 10+ years as a customer and have used a few products myself :)
As Karl suggests above, it really depends on what you are looking for. Most of the mid-market enterprises that we work with run into a very similar problem. They have outgrown many of the low-end and open source monitoring tools that they began with, yet do not want to heavily invest in the expensive and complex "Big 4" type monitoring frameworks. We created up.time (uptime software) over 12 years ago to meet this specific need.
up.time is a comprehensive and unified all-in-one monitoring software that covers multiple monitoring needs, including server monitoring (Windows, Linux, UNIX, VMs), application monitoring, network monitoring, capacity planning, and SLA reporting. up.time gives IT admins and managers everything from deep dive forensic analysis tools, to unified reporting, to fully customizable dashboards, to proactive alerting across one or many platforms running on-premise, remotely, or in the Cloud. And all of this can be deployed in days, not weeks or months.
We offer live weekly webinars to help you see if up.time is the right fit, as well as a fully supported 30-day enterprise trial.
For those looking for a mid-enterprise monitoring solution, please drop by our website ( http://www.uptimesoftware.com/ ) or email us for more information (info @ uptimesoftware dot com )
*I am an employee of uptime software with over 15 years in enterprise IT and 8 years in the enterprise IT monitoring field
As per what Karl said above what exact problems are you looking to address. However, as with a number of clients I deal with they are typically trying to ascertain whether problems are application or network based.
From this synopsis a good mid-market solution would offer coverage for both network and application (so flow collection and packet collection/analysis capabilities). Also having the ability to collect SNMP would complete a quick snapshot of your environment.
Whilst I work with Fluke Networks products on a daily basis the TruView solution offers all these is a single rack unit collecting from 1Gbps to 10Gbps. Their ability to provide a solution with tapping and aggregation also makes Fluke a good place to start.
There are bigger and more comprehensive solutions that address on server metrics but these can be much more expensive and more more time consuming to deploy.
Coupling on server capabilities from New Relic is also a very nice solution as they have a free service (30 day pro service for free) to help potential users understand what they need to collect. New Relic are cloud based so they also provide the ability to monitor hosted solutions that are beyond the reach of physically deployed products.
Again the usual clarifying questions apply.
--> What problem are you trying to solve (ie what's not working with what you have today)
--> What is the profile of the IT system you are needing to monitor (a Real Time operation will require something different than a 100% PaaS Cloud based solution)