What is our primary use case?
We are using the solution for on-prem, all our applications, and network monitoring. It fits everything. We use it for monitoring and reporting on our ESX, Pure Storage, Cisco, F5, Palo Alto environments. We also use it for alerting, graphing, and capacity planning. We use it for everything.
We are using the latest version. We have LogicMonitor Collectors onsite in our data center, but the dashboard and everything else is all the cloud model. We use both AWS and Azure as our cloud providers.
How has it helped my organization?
It has improved our organization with its capacity planning. We have a performance environment that we use to benchmark our applications. We use it to say, "Okay, at a certain level of concurrency, we know where our application will fall over." Therefore, we are using LogicMonitor dashboards to tell us that we're good. Our platform can handle X number of clients concurrently hitting us at a time. That's how we use it to size our business, e.g., size our ESX environment and Internet pipes.
Our capacity planning team consumes the data on the dashboards. The bread and butter of using the data in the dashboards is to inform, "Hey, what upgrades do we need to make in six months?" So, that data gets consumed regularly by other teams.
In the three and a half years that I've been using it, we haven't had false positives. I'm the primary network engineer, so I can say with confidence, "We have the environment tuned to the point where we don't get false positives."
What is most valuable?
Its historical reporting: I can go into my production F5s and look at the CPU, memory transactions, application transactions, and bandwidth utilization. Then, I can use all of the graphing metrics. I can have a dashboard for my production environment and all of my critical elements where I can graph utilization over time and use it for capacity planning. It's a single pane of glass for everything about your environment health.
We build our own dashboards, creating dashboards for our various environments. It is all written in HTML5, so it's super easy to drag and drop, move things around, expand, and change dates. It's awesome. We can get as detailed as we want or roll up to a manager/director level. I like its ease of use.
I don't do much with reporting because the dashboards are good enough that they tell the story. I haven't actually clicked on the reports tab in quite a while, so we're probably under utilizing that. If you just go into a dashboard, and say, "Show me my F5 health for the last six months," the dashboard is good enough for that.
I have custom data sources for various things. With data sources, you can go down the rabbit hole real quick because they're very powerful. You can go to the LM Exchange, grab data sources, pull them down and put them into your installation, and then you can tweak them. The idea of a data source is that it matches. For example, if I have a collection of Cisco devices along with a collection of F5 and Palo Alto. There's a generic match criteria which says, "Is a Cisco. Is an F5. Is a Palo Alto." However, it also has all these other match conditions. Therefore, you can build Redex filters or match on 10 Gigabit Ethernet, but not 1 Gigabit Ethernet. You can get super deep in the weeds, and it can get complicated pretty quick, but their support is fantastic.
The solution provide us with granular alert-tuning for devices. E.g., I can use it for application website checks, where I can set up an automated check from a bunch of different test facilities. So if I want check my application, I can ping it from five locations. I can tune the data source so that if the millisecond response time is ever greater than 500 milliseconds, it lets me know. I also can tune it so it won't alert me on one fail, but alert me on three fails. For any data source that you're collecting for, you can set thresholds for notice, warning, critical, and what to do if it fails one, two, or three times. You can just go crazy tuning it.
We found the solution monitors most devices out-of-the-box, such as, F5, Cisco, Palo Alto, ESX, Pure Storage, Windows database connectors, ActiveBatch. and Rubrik.
What needs improvement?
The ease of use with data source tuning could be improved. That can get hairy quickly. When I reach out for help, it's usually around a data source or event source configuration. That can get challenging.
For how long have I used the solution?
I joined NWEA about three years ago and was new to LogicMonitor at that time. Three and a half years is how long I've been using it.
What do I think about the stability of the solution?
The stability is perfect. It is 100 percent.
Right now, we're collectively administrating it across the organization at five or six people. It doesn't take day-to-day massaging.
What do I think about the scalability of the solution?
We have close to 50 users utilizing the solution. It's mostly a production/operations audience. My Ops team has a couple hundred people, but I doubt that many of them would be consuming the dashboards on a regular basis.
The product is extensively being used. It's completely a part of our production environment. We couldn't maintain our environment without it. It's production-impacting.
I've never been presented with a scenario where it didn't scale.
How are customer service and technical support?
Their support is fantastic. The support is always super friendly and helpful.
From the dashboard, you click support. You chat with an engineer, saying, "I'm trying to clone this data source that already exists and I want to tweak it so it only applies to interfaces with this tag." You can clone a data source, tweak it to match what you want, negate the things you don't want, and then you have a new data source. You can take all of their stuff out-of-the-box, and it generally works, then you tweak it as needed. So, data sources are pretty easy to use.
Which solution did I use previously and why did I switch?
I think my team was using Nagios before. That's just a burning trash heap of an old application.
In my organization, as a whole, we have many chefs in the kitchen. We, the infrastructure team, picked LogicMonitor, then we moved all our stuff to it. However, the database team still relies on Nagios because they're like dinosaurs. DevOps uses Sensu Prometheus, collectd, SIEM, and a laundry list of others. The only reason why LogicMonitor hasn't consolidated is because our teams have the freedom to choose their own tools, and we do. Unfortunately, we tend to overspend on duplicate functionality. I don't think it's because LogicMonitor can't do it, but because the infrastructure team picked it, the Dev Ops team was like, "Well, that's your guys' tool. You guys use it. We're going to go pick our own thing." We were like, "Okay, go ahead.
How was the initial setup?
I know that we have added extra Collectors, and it's super simple. We get to a point where we have too many instances on a Collector and it starts working too hard because it's just a VM. So, we spin up another Linux VM, download their Collector code, install it, and then you have another Collector running in 30 minutes. It's pretty straightforward. We add collectors fairly regularly, and it's pretty easy.
I know getting it installed is not that big of a deal, but getting things migrated off of old stuff can be time consuming. However, I wasn't around for it.
If we were implementing LogicMonitor now, we would need to identify when to pull the plug on Nagios, then identify what we wanted to monitor so we were not running duplicates.
What about the implementation team?
One person is needed for a new LogicMonitor deployment.
What was our ROI?
We use LogicMonitor for our alerting and integrate it with PagerDuty for on-call paging. That is key to operational uptime. We live and die by the number of SEV-1, SEV-2, SEV-3, outages, and uptime. It is absolutely critical that LogicMonitor alerts PagerDuty, which alerts the on-call. We are reducing the impact of incidents using the tool by alerting for incidents that we can respond to.
What's my experience with pricing, setup cost, and licensing?
I don't know what we spend on LogicMonitor, but I know that Cisco Prime is a multiple six-figure solution. Therefore, I know we are saving at least several hundred thousand dollars in that we're not buying Cisco Prime.
We pay for the enterprise tech support.
Which other solutions did I evaluate?
The organization I came from had a huge SolarWinds deployment. We also used Nagios, Cacti, and OpenNMS, which is an open source NMS platform. Unfortunately, I've had to do some work with Cisco Prime as well, which used to be called Cisco Works. I installed Cisco Prime for a handful of clients in a past life.
- Pros of LogicMonitor: Ease of installation and use.
- Cons. Tuning data sources can be a bit labor intensive. However, once you get it set up, it's pretty straightforward.
Having worked with OpenNMS, Cisco Prime, and SolarWinds, just the cost and complexity of those solutions is ridiculous. I would never advocate going back to that black hole.
What other advice do I have?
We're fairly self-sufficient. We already use Puppet for automation, and we're starting to move some workloads to Ansible. However, we wouldn't ask LogicMonitor to help us with automation.
Biggest lesson learnt: Know what you want to monitor and what threshold you want to alert from. E.g., if you don't do anything and just start monitoring out-of-the-box, it works. However, if you don't set thresholds, it's not telling you when to take action. So, if you just add things to LM and start monitoring them, you're not done. Until you've set a threshold for where something is actionable, you haven't really finished the job. That's my experience with NWEA. You can click on anything that we've been monitoring, and if you don't have any thresholds set, then you're just making pretty graphs.
I would rate the solution as a 10 (out of 10). I am a fan of the product. It's great.
Which deployment model are you using for this solution?