Enigma NMS Review

Notifications, custom OIDs, good graphs are key for us, but needs more flexible alerting

What is our primary use case?

We are a wireless internet service provider (WISP). We use it to monitor an average of about 10 metrics on 11000+ devices (currently), polling every two minutes. It is customer-premises equipment.

How has it helped my organization?

Before Enigma, we did not have a monitoring system capable of holding data from all of the devices we wanted to monitor. If customers called complaining of an intermittent problem, our techs would have to tell them "We will put you on monitor and look at the data collected after a week or so." Now, we can simply look at the past month of data that has been collected for any customer that calls in with a problem. This is a huge improvement. 

What is most valuable?

It was difficult to find a product with all the features we want and that would not cost a fortune. I have to say that the decision to purchase Enigma came not because of a particular set of features, but because the developers are very fast to develop features if they agree that the feature is desirable. 

Enigma still does not have all the features we want, but it has enough of the most desired features: 

  • scalability
  • notifications
  • custom OIDs
  • good graphs
  • small polling interval
  • long term data retention
  • single poller

What needs improvement?

We would like to see more flexibility with alerting. Since our adoption of Enigma, it has improved greatly in this area, but there are still alerting threshold configurability limitations that we would like to see improvements on.

I would also like to see dynamic grouping. For example, in our case, we have Subscriber Units that are connected to APs. These are constantly being re-homed, or pointed to a different AP on the same tower or a different tower altogether. There is an SNMP value corresponding to the AP name, and one for the AP MAC address. I would like Enigma to be able to form groups based on the AP name (or the MAC address, either one), which are dynamically changed. That said, we have been able to code some things to automate grouping to a degree, using the REST API, which is a growing feature in Enigma as well as many other software products in general.

I would also like to see Enigma move away from CentOS 6.5 and onto a more current platform, as well as away from myISAM tables to InnoDB. No improvements are being made to myISAM, and this has been the case for several years, from what I understand. So, this code is going to become more and more outdated by that virtue alone. The developers say that myISAM tables are faster. I did some reading up on it, and it seems that at one point, when InnoDB was new, it was slower than myISAM, but InnoDB has made major improvements since then.

For how long have I used the solution?

Three to five years.

What do I think about the stability of the solution?

I did encounter issues with stability, but I believe it was a limitation of disk speed. In a monitoring system which is potentially performing multiple thousands of write and read operations every second, it will very quickly bottleneck in the disk I/O system. You need a fast disk system. We caused a hard crash that was unrecoverable because of this. In another incident, we lost a large amount of data. In this process, we decided to limit the data retention to 30 days, so the table sizes are limited, and read-write operations would suffer from less latency. The bigger your tables are, the longer it takes to seek through them to find the correct read or write position. This is an exponential factor, to my understanding.

What do I think about the scalability of the solution?

So far, as mentioned earlier, we have been able to provide graphs on at least 110,000 metrics spread across more than 11,000 devices. Enigma automatically begins monitoring interfaces that are up and active. There are a few different factors involved. This is not really very configurable (you can enable them manually but disabling them once they are enabled is still kind of sketchy last time I checked), but it works well enough by itself that it requires little to no maintenance. 

Environment monitors are handled by different code than the interface monitors (in CentOS, you can see all the different scripts that are run through a Linux cron job), which means that they are logically separated to a certain extent, but are pretty well integrated into the GUI. 

For example, I gather that NETSAS has a difficult time relating database objects in the direction of Environment Monitor-TO-Device, which is, I believe, why we still don’t have the dynamic grouping feature I mentioned earlier. But when viewing an individual device, you can easily view all the Environment Monitors related to it.

How is customer service and technical support?

The level of tech support is top-notch. They are in Australia, so we have to wait until afternoon for a response, but they take care of us, and most problems are resolved within the same day.

Which solutions did we use previously?

WhatsUp Gold for monitoring CPE, but now all CPE monitoring is done by Enigma.

We still use WUG today because it has very flexible alerting configurability, but it just cannot scale. It was not able to handle more than several hundred nodes before its performance would suffer significantly. And it was even thinning out the data tables (rolling up data) starting at an age of 12 hours (by default). WUG may have improved since the decision was made to move away from them, but we felt that the Linux/MySQL platform would be better able to handle the larger-scale demands without costing us a lot of licensing money. 

For example with WUG being Microsoft SQL only, one must spend quite a bit of money to get the SQL version that will use more than 64GB of RAM. Currently, we have Enigma on a VM in vSphere with 100GB RAM, 12 CPU cores (Cisco UCS Mini with B200 blades, and NetApp FAS2500 Series). To increase the processing capacity, all we have to do is give it more resources (RAM, CPU, Storage).

One thing I should mention is what we are not using it for, and that is server monitoring. We don’t monitor any VMs, Windows, or Linux systems (besides Enigma itself). So I can’t really speak to aspects in that realm that don’t cross over into the network device realm. Having said that, Enigma has a lot of specific detail coded into it for discovering and monitoring Cisco devices. If you are a Cisco shop this might be a major consideration.

How was the initial setup?

It’s hard to generalize and be objective in this assessment because I was new to network monitoring, and new to ISP operations, and new to SNMP, and new to Linux. Just about everything was new to me. Someone with better background knowledge would probably have been much faster than I was. 

If you are a company that is on a private network behind a firewall, I would say just let Enigma go to town auto-discovering the entire subnet. Looking back, I would recommend deciding on IP numbering conventions, and device and interface naming conventions, and implement those conventions before doing any discovery, but we didn’t do that. The benefit didn’t outweigh the cost. I would say if you decide beforehand on naming, and what you want to monitor, and what thresholds you want to alert on, and who needs to be notified about what, it would help significantly.

What about the implementation team?


What was our ROI?

We currently don't have the capacity to calculate ROI on something like this. The ROI is more of a customer satisfaction level or quality of service that we wanted to achieve, not necessarily related to quantity of time spent. 

What's my experience with pricing, setup cost, and licensing?

I don’t know what the pricing is currently, but there are different price levels. We got an unlimited license, and considering what we get for the amount we paid, it’s a good deal. The other licenses are limited on number of devices (not monitors/metrics).

Which other solutions did I evaluate?

Nagios XI, Statseeker, WhatsUp Gold, PRTG, InterMapper, Cacti.

Some of these fell out of the running pretty quickly just on the basis of the feature list or the pricing model (ie. licensing per interface, or worse: per metric), so I didn’t necessarily play with all of them. A close second was Statseeker, which was fast, but didn’t allow custom OIDs at the time, which was a big deal-breaker.

What other advice do I have?

It is always improving. Take a look at their release notes and you will see the pace. 

I like the philosophy of the developers, which is to listen to customer feedback and develop whichever features they think are desired most. Since they are a small company, they can do this with quite an impressive turnaround time. There have been multiple features that we have requested, and received immediate feedback on, in the form of a feature addition in the next release. This is beneficial but has drawbacks as well. Sometimes, the new code has not been tested thoroughly enough and thus does not work as expected right away, but these are quickly resolved if you pipe up about them.

Enigma has lots of features out of the box. You don’t have to be super technical to get it going, though every bit of general understanding you can get about Linux, monitoring, and databases will help. Again, as with any monitoring system, if you are going to be polling more than a few thousand metrics, make sure you have a disk system that can handle the load (all-flash would be best). I hope this info helps you make an objective decision.

Disclosure: I am a real user, and this review is based on my own experience and opinions.
Add a Comment

Sign Up with Email