What is our primary use case?
We use it in a few different ways:
- For general monitoring of operating systems.
- Leveraging some customized offerings, specifically for creating application monitoring.
- Some external site-to-site monitoring in various places, ensuring that our websites and external pieces are available over an Internet connection.
How has it helped my organization?
It has given us a clearer view into our environment because it's able to look in and pull things off of the event viewer or log files. We have been able build dashboards and drill down on things, which has helped improve our time to respond. Also, in the case of specific conditions being met in X log, we have been able to get in and take a look at that a lot faster rather than trying to connect and parse through the log and figure it out. It's able to flag that and work us towards a solution faster than normal.
We have a few custom data sources that we have defined, especially for our application. It is able to leverage a specific data source and build monitoring rather than just having it be a part of the general monitoring. It is segmented and customized for what we actually need, which has been pretty helpful.
Custom data sources have given us a bit more information from a point in time and historically viewpoint. In the console, it is easy to compare week-over-week or month-over-month traffic and numbers. As changes are made in the environment, we can look and have better historical knowledge, and say, "We started seeing this spike three months ago and this is the change we made," or, "We started seeing this CPU usage reduced after the last patch or software update." It lets us be able to compare and get a better insight into the environment over a longer period, rather than just at a point in time, when investigating an issue.
The solution has allowed us to have specific alerting for specific messages. If we know that X messages on a notification let us know this state has happened, we can then set that to be either an email notification or a tracking notification. In the cases of a log meaning that we have a specific issue, we can have it send an email and let us know. Thus, we have a better, faster response. We also have integrations with PagerDuty, which allows us to be able to make things very specific as to the level of intervention and the specific timing of that intervention. It has been nice to be able to customize that down to even a message type and timing metric.
The solution’s ability to alert us if the cloud loses contact with the on-prem collectors has been helpful to know. E.g., if we are having an issue with our Internet connection or some of our less monitored environments, such as our lower environments in different data centers where we don't have as heavy of monitoring. Therefore, it's helpful to have that external check there versus our production environments which are heavily monitored. Typically, we are intervening before it times out to say that it's lost the connection. It's been helpful to have that kind of information. This way, we know either via a page or email if there is any sort of latency or a timing issue with it connecting to the cloud. It's been helpful that it's not just a relying on the Internet connection at our site, but is able to see into our environment, then it monitors when there are connectivity or timeout issues.
We use it for anomaly detection because our software is designed to function in a specific way. Therefore, anomaly detection is helpful when there are issues that may not be breaking the software but when it is running in a nonstandard way, then we can be alerted and notified so we can jump on that issue. Whether the issue will be fixed it in the moment or handed off to development to find a solution, it's helpful to have that view into how it's running over the long-term.
It is a pretty robust solution. There are a lot of customizations that you can put in for what you want it to be checking, viewing, and alerting on. As we get alerting and realize that that's not something we need to be alerted on or it happens to be normal behavior, a lot of that information can be put back into the system, to say, "Alright, this may look like an anomaly, but it isn't." Therefore, we can customize it so it gets smarter as it goes on, and we're really only being notified for actual issues rather than suspected issues.
It's been helpful to be able to have some information to be able to pass along to development that's very specific as to what the issues are. E.g., we can see an anomaly during periods of time while this is running, then pass that along so development can figure out, "Is it a database issue, an application issue, or possibly a DNS level issue?" They also determine if there are further things that need to be dug into or if it is something that can just be fixed by a code change.
The solution’s automated and agentless discovery, deployment, and configuration seems to work pretty well for standard pieces, like Windows servers and your standard hardware. It has been able to find and add those piece in. Normally, if I'm running into an issue with finding something, it's usually because it's missing a plugin or piece that just needs to be implemented, which just needs to be added in manually. However, 99 percent of the time, it finds things automatically without a problem.
What is most valuable?
The flexibility to be able build a custom monitor is its most valuable feature. Because it's just a general CPU or memory, it doesn't always give you a full picture, but we can dig into it, and say, "These services are using this much, and if these services are using more than 50 percent of the CPU, then alert us." We can put those type of customizations in rather than use the generic out-of-the-box things with maybe a few flags. It's been very nice to be able to customize it to what we need. We can also put in timings if we know there are services restarting at 11 o'clock at night (or whenever). We can put those in so as long as it's doing exactly what we want it to do, which is restarting the service, then it won't monitor us. However, if there are any issues or errors, then it monitors us right away. That's been really helpful to leverage.
We use a few dashboards. A couple are customized for specific groups and what they maintain. As I am doing projects, I'm able to make a quick dashboard for some of the things that I'm working on so I can keep track without having to flip between multiple pages. It seems pretty flexible for making simple use cases as well.
I have a custom dashboard which monitors each site and does virtual environment monitoring, such as CPU, memory, timing, etc. It was easy to get in place and adjust for what I wanted to see. It has been one of the go-to dashboards that I have ended up utilizing.
We can kind of get a single pane of glass and be able to view specific functions, whether it be sites or the entire environment. We are able to quickly get in, see what's going on, and where issues are coming from rather than having to hunt down where those issues are. Therefore, it's helped us more with our workflow than automating functions.
The solution’s overall reporting capabilities are pretty powerful compared to ones that I have used previously. It seems like it has a lot of customizations that you can put in, but some of the out-of-the-box reports are useful too, like user logon duration and website latency. Those type of things have been helpful and don't require a lot of, if any, changes to get useful content out of them. They have also been pretty easy to implement and use.
What needs improvement?
It needs better access for customizing and adding monitoring from the repository. That would be helpful. It seems like you have to search through the forums to figure out what specific pieces you need to get in for specific monitoring, if it's a nonstandard piece of equipment or process. You have to hunt and find certain elements to get them in place. If they could make it a bit easier rather having to find the right six-digit code to put in so it implements, that would be helpful.
For how long have I used the solution?
Personally, I've been using the solution for about a year. We've had it in place for about a year and a half, but I came to the organization about a year ago.
What do I think about the stability of the solution?
I don't think we've really had a time where the application or monitoring nodes have failed. The connection to LogicMonitor has been very stable. We haven't had any connection issues to the SaaS offering. It's been pretty resilient and stable from our end.
What do I think about the scalability of the solution?
The scalability seems fine. Every time we've had to expand and add elements, we've not run into any delays or issues with it. It seems to expand with us as we've needed to use more features. We haven't had any issues with delays or timing. It's been able to handle what we've thrown at it.
There are at most 10 users at our company, who do everything from application monitoring to platform engineering to some developers who have access into the solution for some monitoring pieces. Varying segments have been able to get in and they all seem to have had pretty good luck with accessing and using it.
We are using LogicMonitor pretty extensively. We're using it from low level environments, development, quality assurance, all the way up to user testing and production. We have leveraged it in as many segments and parts of the business as we can. It has been really helpful to have it be able to handle different workloads, but also be customized. This way, we're not getting triggered at 2:00 AM because a switch is on in the office reporting an issue, instead we can adjust those timings to report for specific times of the day rather than any time during the day.
We have about 1,000 totals including VMs and physical devices.
How are customer service and technical support?
The technical support has been pretty good. I haven't had to leverage it, but some of the people I work around have taken it on when we have had questions or issues to leverage the process. They seem to be fairly responsive and the timing of it is usually good. We are usually hearing back in minutes instead of hours. We haven't had any major issues with them.
Which solution did I use previously and why did I switch?
We've eliminated three different monitoring tools by leveraging LogicMonitor. We had two different in-house, custom built tools that were used for a long time that we were able to roll off, and we also used Nagios. I have also used Zabbix and Orion.
LogicMonitor has reduced our number of false positives compared to how many we were getting with other monitoring platforms. We leveraged the solution to focus it down and only look at the specific things that need monitoring, e.g., rather than every time a service is down we get notified, instead if it's not a critical service, then we can just get a flag, go back, and check it. This is rather than getting spammed with hundreds of emails about specific things being down. Thus, we can customize it for what we actually want to know and need for non-issues.
How was the initial setup?
It had already been implemented before I joined the company. We've added a few functions since then, but the core and initial launch of it had already been implemented and heavily used at that point that I joined.
What was our ROI?
We have definitely seen ROI.
We have seen probably a 80 or 90 percent decrease in false flag alerts.
We move our people so they're able to be more proactive on things, rather than having to deal with parsing through and figuring out if something is an issue or a non-issue, that cuts down on our personnel time of managing the day-to-day processes. That's been helpful. At least from conversations I've had with management, they've seemed to have found it to be a good investment and solution for getting our normal work done, but also for making sure that we're ready to go if something does go wrong.
What's my experience with pricing, setup cost, and licensing?
It definitely pays for itself in the amount of time we're not spending with false errors or things that we haven't quite dealt with monitoring. It has been good cost-wise.
What other advice do I have?
I would definitely recommend LogicMonitor. It's something to look at either when signing up for a trial or for a use case process . It's been a great product. It has customizations when you want them, and out of the box solutions if you don't want them. It works and is reliable. Compared to other monitoring platforms I've used in the past, it seems to be the most powerful and robust that I've dealt with.
The solution monitors most devices out-of-the-box, such as, Windows, Windows Server, Linux, F5 load balancers, Cisco firewalls, and Cisco switches. Those have been pretty easy to monitor. Our issues have been with one-off or nonstandard platforms that we've implemented. Otherwise, everything has been pretty easy to implement.
I would rate it as a solid nine (out of 10).