What is our primary use case?
We collect from our primary devices and our endpoints and we look to identify any concerns around regulatory requirements in business use. We have payment card industry regulations that we are monitoring, to make sure everything's going the way it's supposed to, as well as for HIPAA, HITECH, and general security practices.
How has it helped my organization?
In terms of seeing a measurable decrease in the meantime to detect and respond to threats, we live in the Web Console and we see things when they come in right away, and then we triage.
What is most valuable?
There's value in all of it. The most valuable is the reduction in time to triage. We take in around 750 million logs a day. We have a lot of products and that would be a lot of different panes of glass that we would have to look through otherwise. By centralizing, we can triage and take steps much more quickly than if we tried to man all the interfaces that come with the products.
What needs improvement?
There are two improvements we'd like to see. I mentioned these last year and they haven't implemented them yet.
The first one is service protection. I have Windows administrators who will remove the agent when they think that that is what's fouling up their upgrade or their install or their reconfiguration, etc. The first thing they do is to turn off the antivirus, turn down the firewall, and take off anything else. They don't realize that the LogRhythm agent is just sitting there monitoring. Most antivirus products have application protection features built-in where, if I'm an admin on a box, I can't uninstall antivirus. I need to have to the antivirus admin password to do that.
Why does the LogRhythm agent not have that built-in so that I don't have well-intended admins removing things or shutting off agents? I don't like that.
The second one is, you can imagine my logging levels vary. We do about 750 million a day and some days we do 715 million. Some days we do 820 million or 1.2 billion. But there's no way to drill in and find out: "Where did I get 400,000 extra logs today?" What was going on in my environment that I was able to absorb that peak?" I have no way to identify it without running reports, which will produce a long-running PDF that I have to somehow compare to another long-running PDF. I have to analyze it and say, "Well, last month, Exchange entity was only averaging this many logs. Now it jumped up this much. It could have been that." But then, if I find something that spiked, I still have to make sure nothing else bottomed out, because there might be a 600,000 log delta if something else wasn't producing as many logs as it normally does.
I would like to see like profiling behavior awareness around systems, like they've been gunned to do around users with UEBA.
What do I think about the stability of the solution?
It's a well-written platform. That being said, with our log levels, we ultimately have almost 30 servers involved. Some of them are very large servers. It will bury itself quickly if there's a problem.
I find the product to be well-written and very efficient. However, sometimes the error-logging is not altogether helpful. For example, on an upgrade, a systems data processor, a Windows box, was throwing an error code like 1083. Then it just stopped and it died right out of the installer and nobody looked. We searched through Google and what it means is the Windows Firewall wasn't turned on so that it could create a rule for the product. Why wouldn't they bubble up that description so that I wouldn't have to call support and I could just know, "Okay, the firewall wasn't turned on. Turn it back on. Re-run the installer and keep going."
There have been many times where I've been disappointed, where I'll ramp an agent up to Verbose and it will say, "LogRhythm critical error, the agent won't bind to a NIC," or the like. I end up with no really actionable or identifiable information coming in, even though I've ramped up the logging level.
There's room for the solution to grow in those situations, especially with regards to a large deployment where it can quickly bury itself if it can't bubble-up something meaningful. I need to be able to differentiate it from other stuff that can be triaged at a much lower priority.
What do I think about the scalability of the solution?
The scalability is good. We're deployed in two data centers at the moment. We had a little bit of difficulty implementing a disaster recovery situation because it was leveraging only Microsoft native DNS and it wouldn't work with the Infoblox DNS deployment that we use in our environment. They've been working on that behind the scenes. That's one of the things that is queued up for me next.
Scalability, volume-wise, the product works very well. As far as the DR piece goes, I think there's room to improve that.
How is customer service and technical support?
Tech support is good. There are a lot of guys that know what's going on. Sometimes though, I've stood my ground saying, "I don't want to do that." If we have a problem with a server, we can bounce it and maybe it starts running right, but then we don't know what was wrong. We can't do anything about it in the future except bounce it again because that's what worked last time. Sometimes I need to push them and say, "Okay, I want to identify what's wrong. I want to see If I can write a rule that will show me when something's happening," or "I want to figure out if there's something wrong with my scaling and my sizing."
I like support. I think they're customer-focused. But sometimes it seems they've got a lot of tickets in the queue and they want to do the "easy-button." I push back more on some of that. It could just be a situation where the logs aren't going to have that information, and they already know that, but they don't want to say, "Well, our logging is not sufficient. This is the best way forward."
Which other solutions did I evaluate?
What I find is that there are die-hard Splunkers. The problem is that Splunk is not affordable at a large scale. QRadar is not any better. It's just as bad. LogRhythm, for the price point, is the most reasonable, when you begin to compare apples to apples.
What other advice do I have?
From a performance standpoint, I have no problems recommending LogRhythm because it allows me to get in under the hood and tweak some things. It also comes with stuff out-of-the-box that is usable. I think it's a good product. Things like this RhythmWorld 2018 User Conference help me understand the company's philosophy and intentions and its roadmap, which gives me a little more confidence in the product as well.
Regarding playbooks, we have Demisto which is a security orchestration automation tool, and we're on LogRhythm 7.3. Version 7.4 is not available yet because of the Microsoft patch that took it down. We're looking to go to 7.4 in our test environment and to deploy up to that. I'm not quite sure how its automation, or the playbook piece, will compare with Demisto, which is primarily built around that area and is a mature product. However, from a price point, it is probably going to be very competitive.
In terms of the full-spectrum analytics, some of the visualizations that we have available via the web console are, as others have expressed, short-lived, since they're just a snapshot in time. Whereas, deploying Kibana will, perhaps, give us a trend over time, which we also find to be valuable. We're exploiting what is native to the product, but we're looking to improve that with either going with the Kibana or the ELK Stack to enrich our visualizations and depict greater time periods.
We have somewhere north of 22,000 log sources and we average a little over 12,000 messages per second.
The staff for deployment and maintenance is myself - I'm the primary owner of this product - and I have one guy as a backup. The rest of my team will use it in an analysis role. However, they're owning and managing other products. It's a very hectic environment. We're probably short a few FTEs.
One thing that we've yet to implement very well is the use of cases and metrics. Because oftentimes, if we see something that we know - we glance at it, it's a false positive - we're not going to make a case out of it. We might not close it for a day or two because we know it's nothing, and because we're busy with other things since we are a little bit short on staff.
In terms of our security program maturity we have a fairly mature environment with a lot of in-depth coverage. The biggest plus of LogRhythm is that we can custom-write the rules based on the logs and then speed up time to awareness, the meantime to detect. I can create an alarm for virtually anything I can log.