What is our primary use case?
Our primary use case is network monitoring, and security goes hand in hand with that. They're two sides of the same coin. From a network-monitoring perspective, we keep an eye on network links at all times, on the bandwidth usage percentage. It allows us to quickly identify what is consuming bandwidth on a link.
On the security side, it allows us to see issues that occur in the network. Someone might be running up a Tor session. Someone might be trying to hack into something internally or externally. Or there might be excessive use against a particular host or a particular port in our host.
So those two use cases go hand in hand.
It's strictly on-prem. We're a financial organization within Australia and our government regulators say that you must keep all your data, whether it be financial or IP addressing or network related, on-prem. We run a virtual machine with 250 endpoints.
How has it helped my organization?
Scrutinizer helps enrich the data context of network traffic. For example, one of our sub-organizations is primarily responsible for stock trading. They use a time-critical stock trading application called IRESS, here in Australia. I believe it's similar to a Bloomberg-based system in the U.S., but it's based across the Australian stock exchange. That sub-organization of ours has people onsite in their Sydney office who may be doing database operations. They might be copying a 25 GB database across the network. We can immediately tell the head of operations there that they've got an issue because this particular person is copying this database from this source to this destination and that this is the reason that all the network bandwidth is being used.
In addition, the insight that the solution provides us as a result of its correlation of traffic flows and metadata is invaluable. As a network engineer, I don't understand how people operate without it. Without that sort of visibility into what's actually going on in the network, you're running blind. There are other very similar tools in the marketplace, but nothing comes close to the Plixer solution.
Another way it benefits our organization is that it gives us the ability to identify faults and rectify them quickly. It allows us to look at the way people operate in the environment. For example, people were moving around between PCs in a hot-desking scenario, with full home-drive sync and full email sync on. That was consuming a lot of bandwidth across the network. I was able to work with our Exchange teams and Windows teams and explain to them that they should turn off the full email sync and do headers only, and that they needed to stop syncing the entire H drive component. Some of our end users had up to 25 GBs on their home drive, so when they're moving from PC to PC in a hot-desking scenario, that's crazy. We could see that they were consuming all the bandwidth constantly on this particular link. I would estimate that we have improved bandwidth availability by at least 25 percent, throughout the entire day. That's the sort of value we get out of the tool. We knew it was happening, but the ability to prove it to the business units and say, "This is what's actually causing the problem," is just invaluable.
Moreover, we previously we had a 1 GB DCI between our two data centers and we could quite clearly see that it was running at 100 percent the entire time. It got to the point, with the backup solutions running between our primary and secondary data centers, that it was never able to catch up. Using that information, we were able to make a case to our business that we needed to increase our DCI from 1 GB to 10 GB. That improved the backup performance and backups were able to complete successfully. The business is able to continue without any worrying about the backups not being successful.
We're very unique within Australia because we have our data sovereignty laws requiring us to have an on-premise control plane. The customers I've been working with mostly use off-prem or cloud-based control planes. Because we'd set up our vSmart/vManage inside our own data centers, it was unique. Only about 5 or 10 percent of their customers actually had that capability. So to be able to give them access to our environment to actually help develop the solution allowed them to move forward, and provide relatively good visibility, visibility which enhanced what came out of the vManage control plane. That helped us to proactively know when SD-WAN topology changes. In the vManage, we knew events were occurring, but the Scrutinizer solution allowed us to visualize that in a graphical format and to show the business how telephony calls or video or business-critical applications are being moved between links, based on the real-time performance of those links.
As a result, the first thing we did — because we had a combination of fixed wireless and fibre — was to go back to our service provider and say we don't want any more fixed wireless. Most of our branch sites were dual MPLS. We did have a sub-unit that was franchised using Ethernet solutions, but our dual MPLS connections were provided by fiber, primarily, and fixed wireless as a backup or alternate link. We could see quite clearly that our data was constantly being moved over fixed wireless due to issues with the way that the radios were deployed or the ways that the radios were tuned. As a result of that, the service provider went back to its fixed wireless division and made them do some work to improve the service.
Scrutinizer has also helped to reduce the time to resolution, especially for network events. Without some sort of application visibility and control system, you have no visibility into what the problem is. All you have is your best guess. Having that recorded data, and being able to play it back and look across time at bandwidth utilization, enables us to show problems to the business and eliminate them immediately. I had it on a big screen next to the operation sections. As soon as something went red, we clicked on it and we understood the traffic flow that was causing the problem. And if it was not legitimate, we were able to go directly to that end-user, because we had it tied into our AD, and tell that end-user to stop doing what they were doing or to do it outside business hours. Now, our mean time to remediation is about five to 10 minutes, maximum. Without using Scrutinizer, we'd be best-guessing for hours on end. When you have a look at, for example, what's going through a router, you look at the percentage usage on the interface. You can't look at per-flow analytics.
What is most valuable?
The whole package is valuable.
Personally, as a network engineer, the ability to identify what traffic on the link is consuming all the bandwidth at any given time, and provide immediate feedback to the business, is the most valuable feature.
We've also got the advanced reporting on the security side of it, not the NetFlow side. We've always had that integrated into our SIEM solution. It's one of the things you can add on top of what Plixer offers as a base package. It runs analytics over all the NetFlow and then provides signature-based recognition of problems in the network environment and provides that feedback through a reporting mechanism. We've customized it to push that into our SIEM solution.
What needs improvement?
There is room for improvement around the data that they have on the website about solutions. I understand that putting a particular appliance into any given organization is going to bring its own challenges — and Plixer does do a good job of blogging it — but they should have more templated solutions on their website. Going out and identifying how to do RTP performance with a Cisco router, or how to do application response times in an Arrista data center deployment was where most of the work was. We had to identify the end-vendor's configuration where Scrutinizer worked. They should spend some more time documenting solutions and putting together white papers.
For how long have I used the solution?
I've been using the product since 2014.
What do I think about the stability of the solution?
It's very stable. It can go up to a year or two without a reboot. It mainly gets rebooted when I do an upgrade.
During 2015 there were a couple of releases and I had a few stability issues. That was mostly because I moved the database from a Windows appliance to the Linux back-end. It didn't quite sync across. I just deleted the maps and rebuilt them from scratch and that fixed all the problems. That was the only real stability issue we've had across the journey.
We had one upgrade that didn't go as well as it could have, but Anna was able to jump on it with our support engineer and fix it within 15 minutes. It was just a matter of reaching out. They were on the phone within 20 to 30 minutes and got it sorted for us.
What do I think about the scalability of the solution?
We're running 250 reporting end-points across our firewalls, data center switching, the SD-WAN deployment, and our branch and campus switching — all off a VM. If I was going to run any more than that, I would probably look at a hardware appliance or a distributed model.
We don't currently have plans to increase usage, but our organization invests in a lot of other organizations and that's when we would use it more. For example, in 2016 we bought another financial organization and we had to deploy to another 10 branches with 20 appliances, plus switches. It just depends upon what the business requires. I've got good visibility across my entire environment at the moment.
How are customer service and technical support?
Their tech support is unbelievable. They're really good. I've never been out of sorts for more than 15 minutes. That's a fantastic response time, considering I'm in Australia and they're in the U.S. The guys are mostly in Maine and they jump on after hours to help me out. These guys are awesome and if I've got problems with it, I know that I can reach out and they'll sort me out immediately.
There's no comparison to some of the other vendors I've worked with. I've had maintenance with Cisco and it has taken them nine days to replace a device. It's to the point where I no longer have maintenance of any of my Cisco gear with Cisco. I've gone to a third-party.
Which solution did I use previously and why did I switch?
My predecessor made the decision. He's a very security-minded, security-focused individual. Most of the other vendors are providing a solution that looks at NetFlow analytics and that's it. Scrutinizer provides NetFlow analytics of network performance, but also provides security.
We do use Darktrace for a different reason, on top of Plixer. But the advanced reporting from Plixer is providing me more detail than Darktrace. Darktrace is giving us some good PLP stuff, but they are for different purposes. Darktrace is looking for more shadow-IT stuff, where Plixer is looking at more real-time flow and analytics.
Plixer's years of experience in delivering security and network visibility solutions influenced our decision to go with them. They seemed to have a solid solution, out-of-the-box, in 2014. Back then, AVC was not something that was widely deployed. That was pretty much the stone age of application visibility control, especially in Australia. There are still not a lot of people using AVC.
How was the initial setup?
It has a steep learning curve, not because the product is hard to use but because to actually deploy application visibility control, you need to have a fairly in-depth understanding of networks, network flows, and application visibility control. In my case, it was an NBAR deployment, which is the Cisco Layer 7 DPI. You need to understand quality of service and how that actually all ties in. To be able to use the product effectively, you need to be a fairly advanced network engineer.
Once you've got it set up, you can then give that information to the service desk and the service desk can immediately see what's happening, without having to annoy me. Once it was set up and deployed, we were able to give it to everyone within the IT infrastructure, and the service desk, and they were able to find the problems on their own, straight away, without having to deal with the network team.
The initial deployment to get it set up was a matter of a change to include NetFlow export on all my WAN routers and my internet routers. The deployment of the appliance took about half an hour. But it was the going around and configuring all the routers that took up most of the time. With all the configuration it took a very long time. In a production environment, you can't just go around and make changes on devices. I had to go and present the change to the change advisory board. There was all the paperwork associated with a particular change. And then rolling it out across the entire production environment, where I had 80 branch sites that were dual MPLS, and 40 or 60 non-MPLS Ethernet-based connection sites, it took about 100 hours. But that is not a reflection on the Plixer solution, that was a reflection on the way that my system internally works with change and the time it takes to actually do things.
The strategy was that once we got it up and saw flows in there, we then went and deployed it globally on all our routers. Over the years we gradually made changes. Once a year, we sit down and have a look at our quality of service and application visibility control. That's a pretty intensive process of understanding what sort of applications are running in the environment and then categorizing them through the quality of service side of the house. We then look at what we want to be monitoring in detail — in particular, with response time for applications or real-time flows in the environment, and fine-tuning our IPFX policies that are deployed on our Cisco routers. That's a little bit time-consuming, but again, that's not a reflection on the Scrutinizer.
What about the implementation team?
I did it all myself.
We didn't need a great deal of time with Plixer, once we got it up and running. I worked with someone there for about three to four hours who gave me some more information about how to use the appliances properly. Because she was very good at what she does, I was able to get that information and deploy it immediately. It came down to working with the individual vendors' products: Palo Alto firewalls, Cisco Nexus data center switches, Arrista sFlow. I had it deployed on Cisco ISR 2s, ISR 3s, and ISRs. I had it running on the Cisco 9300 and 3850 series switches, as well.
What was our ROI?
The ability to fault-find and provide business continuity and the speed to resolution has been the return on investment. People can see what's going on in the network. They're not wandering around for two to three hours, not being able to do their job because there are problems in the network. We can immediately see that this person is doing the wrong thing and we can say, "Stop it." Previously, we would have had to wait for that person to finish what they were doing, and that could bring all 2,500 users down for a period of time.
What's my experience with pricing, setup cost, and licensing?
We pay our one-off cost for the licenses, per device, in blocks of 50. And then we pay an annual maintenance fee of about $15,000 Australian, which is, at this point in time, about $9,000 US, for those 250 devices. The upfront costs for the 250-license use, were about $50,000 Australian, which is about $32,000 US.
There is also the cost of the infrastructure, but that's a little bit hidden: the storage infrastructure and computer infrastructure to run it.
The price point is on par with its competitors, but you get more value for money out of Plixer because you get that security focus as well.
Which other solutions did I evaluate?
We evaluated quite a few, including open-source. The one that came closest was the LiveAction Networks solution, because that's what Cisco recommended at the time. But it was looking at network performance, not security. Plixer was like killing two birds with one stone. It had a better platform for network performance monitoring and it gives you the bonus of security monitoring.
The way that LiveAction displays traffic between devices in a map is probably a little bit better. Aside from that, the level of data that you can drill down to within Plixer is significantly enhanced, compared to LiveAction.
Overall, Scrutinizer has much better functionality.
What other advice do I have?
The biggest lesson I've learned, personally, by using Scrutinizer, is that not many people understand what's going on in their network with their own applications.
My advice would be more around the equipment you're deploying it on, the exporters. Plixer is very easy to set up and get running. If you're going to be running more than 30,000 or 40,000 flows, go with the hardware version. But, be aware that IP effects exporting on Cisco devices; it can take a heavy toll on CPU.
For maintenance, it's pretty much just me. It's pretty easy to keep up and running. My team can do it, but I'm the guy who handles it. There isn't a massive overhead to manage it. The things that took a little bit of time were fine-tuning data retention, policies, etc., based upon A) what the business needs, to be able to fault-find, and B) the storage availability, based upon the number of flows in our environment, because we're running up to 30,000 flows per second.
We have about 30 users across the whole of the IT infrastructure. There are five primary users within the network team, plus me. Then we have the rest of the infrastructure team, which has about 15 people, and we have the service desk personnel, where there are 10 to 15 users.
I honestly don't think there are many areas where Scrutinizer could be improved. It's a pretty robust, out-of-the-box solution. When you compare it to other AVC solutions for monitoring purposes, it's fairly feature-packed. To use 100 percent of the features is almost impossible. For the first few years, until I became comfortable with the solution, I was only using 10 to 20 percent of them. Once I understood, and spent some time working with the team at Plixer, and they gave me some good feedback on how I could use this in our environment, that's when I started using 50 to 60 percent of the feature set. I still don't use 40 percent of the features because I just don't have a need for them in my particular environment.
I've been really happy with it. And because they're such a well-meshed organization, I've had access to everyone from my sales rep to the head of support to the VP to the CEO of the organization. I've talked to all these people over the years. They're very customer-focused. It helps you to be able to achieve your goals. As a network engineer, you don't want to be whining about your monitoring solutions. You want to be using them to worry about the problems that are happening in the network. They've taken the concern about monitoring off my plate and allowed me to focus on my job.
Which deployment model are you using for this solution?