What is our primary use case?
It's deployed at a customer in the banking environment and it monitors the perimeter edge in the data center. It's used for visibility inside the environment as well. The traffic is only being sent via TAP data currently. We don't have any NetFlow data to the system, as yet. We have the NETSCOUT TruView system in and that performs for TAP data and NetFlow to monitor the branches.
How has it helped my organization?
For some of the applications we've managed to drill down and get more granular data, because it provides such small granularity — a microsecond or a millisecond of data — that you can actually get finer response-time detail out of it. That helps a lot.
It has improved some of the visibility of some of the unified communications with the ability to drill down into finer time increments in the packet data. We are able to search through those and get those Wireshark-types of views, with some extra flexibility and visibility on packet data or wire data.
What is most valuable?
The quick drill-down views are similar to Wireshark views. Those are quite nice, with the views on how you interpret some of the data. The granularity of how far you can drill down into milliseconds or microseconds is a very nice feature. It actually stores quite a lot of data in its database. It enables drilling down for reporting.
The solution transforms packet wire data into real-time data that is ready to act on. We've set some of the alerts to alert on it. We can look at that packet data, or we'll use scenario-based alerts, to then further drill down and see what the system has picked up as an anomaly or a scenario that's being analyzed by the system. We can investigate it further and see how we can resolve the issue or alert on it for the client.
We received some documentation to integrate it with ServiceNow. We're busy looking at that for the near future to integrate into that or another vendor's ticket system, and then alert on things in real-time, so there's less delay from our interpreting of data first. And then we can act on it.
What needs improvement?
They can improve still on the workflows, document their workflows that are commonly used.
Also, if you do backups of the system or try to do configuration changes, there are a lot of different formats that you need to separately interpret. It doesn't flow nicely. With config backup, for example, there are a few variants that you have to collect. Otherwise, you have to use the system backup, which we haven't restored yet, so I don't know exactly how that process works.
There are one or two things for the grids that would be nice to have. And it would be nice to be able to change some of the metrics, here and there, on the normal overviews.
Currently it's working. We had a lot of issues in the beginning with patches that we had to load, but that was more of the teething and learning how to configure the system as well. It's not quite the same as the TruView which has end-user response metrics. The nGeniusONE doesn't quite do the same thing.
It's a more technical tool compared to what we're used to, or what the client is used to with TruView. For some of the stuff we've seen we have had to build multiple sections or multiple pages to get a view of the environment or branch or application.
On a scale of one to 10, the solution's ability to transform packet wire data into well-structured, contextual data is a seven. There is room for improvement. It goes back to the workflows. We don't know some of the workflows yet, and it's not something that you can just read up in the manual. There is some stuff in the help manual and online, but it's to a point where you need to purchase extra training and services from them. You can't just go and read up on it yourself and learn from A to Z and then, if you require extra training or certification, you could go further in-depth into that. That's part of the business model, I assume.
Also, it's not always the case that the solution provides the right people in our organization with the right information in a single pane of glass view. There are times where we would want to get a different view on some of the service dashboards. We can't really get all the views that we would want on a single pane of glass.
Overall, there is room for improvement, but so far it is a useful system.
For how long have I used the solution?
We deployed NETSCOUT nGeniusONE last year around April, so it's just over a year now.
What do I think about the stability of the solution?
Currently we're running quite stable. There were a few hiccups in the beginning with stuff not working. But currently we're running more or less stable. We are running on version 6.2.2. There are a few useful things in 6.3, but we were advised not to go that route yet because it's not 100 percent stable. Our sales engineer said to hold on, just to see how some of their other clients experience it and see how many issues are still being noted in the system before we move over to that newer version.
What do I think about the scalability of the solution?
We'll probably increase visibility in future because it needs to replace TruView. Currently we are only using packet TAP data. Later on, as NetFlow and those things evolve, we will need to move over to NetFlow collectors on the system as well. Currently we're using them on TruView.
And we need to expand to some of the newer data centers that the client has moved into, as well as the cloud section. We need to expand into those as soon as the client has a bit more budget and they are happy that the system is working and the views and the consolidated views are giving them what they want. Then they'll expand more on the system.
The key thing for us is to get the VAR service up and running, which should be starting from today. They've sorted out their remote access. That took us a few months just to get into the banking environment with all the nondisclosures and security checks. We are quite happy to get that started and to see how they can assist us on the system. We want to do a sanity check on the system to see what we've missed.
How are customer service and technical support?
We have an account with them and each engineer has an account where they can log TAC cases, and our sales engineer sees some of the stuff that we seldom hear and assists where he can. Otherwise, we work with the guys overseas. It depends which section of the system it is for unified communication. Cases have been escalated, eventually, to assist configuring some of the things.
We've had a few issues with one of the InfiniStream storage units, and that took a long time to resolve. The guys are still learning some of the things on the system themselves, but that eventually got resolved. But that may also depend on the support model we took.
Once you get to the higher-tier support guys, your issue normally gets resolved quite quickly.
Which solution did I use previously and why did I switch?
We've been using TruView. We've known for a while that we would need to switch because it was an old Fluke Networks product which was bought by or moved to NETSCOUT. We knew at some point in time it was going end-of-life. We need to keep it up and running for as long as possible, at least another two or three years, until the end of the contract, and see how long it lasts after that. Slowly but surely we'll migrate to nGeniusONE as we expand visibility.
How was the initial setup?
The setup was a bit complex, documentation-wise. There is a long list of documentation just to deploy the system, with a lot of variations. There's tons of documentation. Their portals reflect all the documentation and you need to go through various sections of the documentation to find what you're actually looking for.
We managed to get it in in a weekend. It was a relatively short time just to get the equipment in. The InfiniStream we took uses attached storage. It has an IPMI which wasn't mentioned in the original deployment documents. I managed to eventually find out what the base system is, a Supermicro server base. I then managed to get documentation around how to configure it and what the default IP address is for those. I had to configure that, because there are certain things that you can't do if you don't have that to update the firmware of your storage array — shut it down, restart it, those types of things. That wasn't on the original one-page install glossy.
It's a lot different than what we're used to in terms of the various sections that you need to configure. The workflow for some of the stuff could use some improvement. It sometimes feels like the system is silo-based or sectional-based, and that it was then all put in one system. There isn't just one place you can configure your application site or a quick-start "how-to." If you want to configure an application and then get it on your dashboard or your service views, it would be nice if it gave you an auto wizard which would say, "You want to configure an application? Okay, next." You would fill in what is required, click "next" to get you to the next step and keep on following the same workflow so that you can't really deviate. If you know which sections you want to configure, maybe then you would configure it manually, but a wizard-based workflow that's set out to be followed would be good.
As we learn stuff we've transferred the knowledge to our client and they have learned themselves as well, playing with the system. As they run into a workflow issue, then we try to assist or we contact our sales engineer to ask if there is a better workflow for some of this, and how to get to the pane that we would want to be on more quickly. For some of it, there was a quicker way, and for some of them the system is built in such a way that there is not a quicker way to get to some of the views.
It requires quite a lot of staff to set up and manage the tool; there's quite a learning curve. What we normally like to do is load it offsite, deploy the system, prepare it properly, get the base configuration on, and load at least some of the applications, but we didn't have the luxury of that kind of time. It took us a bit of time compared to what we've been used to on the TruView. We tried to configure the applications, but it's not quite the same. In workflows we've missed things here and there, things like going to a different view to associate applications to a site or an interface. We missed that at times. That's where the automated workflow wizard would help a lot, to make it easier for anyone to use the system, to climb in and start configuring it.
We're still busy streamlining and working on our alerting, to get those properly set up. NETSCOUT, from their side, is PoC-ing the VAR service to assist us for three or six months in streamlining the system, see where we're running short, and also to do system checks and see what else they're going to have to improve on the system.
We're not really a proactive system yet because we're still trying to define some of the things. The system is not at a scale where it can monitor each and every thing. There are a lot of things in the environment that we learn and get to know of on a daily basis, as they deploy new things. There are also things that we've not heard of because some of the environments are still silo-based.
Which other solutions did I evaluate?
I don't know what the client is looking at, because they can acquire from other vendors. Because we're part of the networks team, we're more focused on the actual network component.
What other advice do I have?
It's not an easy system, it's a very technical system. There are some views that you could get for a management or objective overview. Even our client mentioned that it's more a technical tool. That comes back to the workflows and the drill-down and the amount of time you spend to drill down into a scenario. That sometimes makes it too long in a real-time troubleshooting scenario or focus session. That makes it a bit difficult. If there's an outage in the environment they might start looking at you because they're waiting for you to provide information. I assume that would improve a bit when VAR service comes on board to show us what we're missing and how we can set up scenarios or extra alerting on the system to improve drill-down and the time to respond to or the time to resolve issues.
It does auto-discover some of the stuff. I don't think we've really used everything that's available. We've used some of the auto-discovery for URLs or web-related links, as it picks them up. We've used some of those and then we further define it. I'm not sure if there's another way or extra things that can auto-discover. Normally we'll get an application and environment from the client, and then we'll define it from there, or we'll use TruView to look at the NetFlow data to see what ports, for example, are being used. Then we will interact with the client to further see what is there. Or we can use nGenius' packet data and pull down what ports are being used from there. Then we can go back to the client and say, "You said port 123," for example, "is being used. We see 123 and another port. Is this other port also part of your application, or what function does it have in your applications?"
As for whether nGeniusONE helps us to get to root cause quickly, it's "yes" and "no." It fits in more with some of the workflows that we're still learning or we may not have the correct workflow. We've learned quite a lot over the last year or so but there is some room to improve, or it might be something that we don't know about; how to navigate a bit faster and better. One thing the client did say, if you compare it to TruView, is that with TruView you get to most of your issues in three clicks. In nGenius you need a few more clicks just to get to where you want to be. And sometimes you need to take a different route through the system to navigate to a different view.
When it comes to seeing a measurable decrease in mean time to repair, or mean time to know, there might be some workflows we're missing, that we don't know. We've used the system now for just over a year, and we're constantly learning new ways to configure the system and new workflows and how to improve our troubleshooting time. But compared to our older TruView system, it takes a bit longer to navigate to certain sections of the system or down to where we want to be, to the packet data, or to drill down into some of the applications.
We use nGeniusONE for Microsoft Teams. There is a case that we want the VAR service to take on for us to tie up some of the communications from external to internal Teams calls as they pass through the firewall. We're going to look at that to see what the VAR can assist us with. The client needs to expand on some of its TAP-ing visibility as well when, in the near future, they change their design.
As far as I know the solution has not enabled us to consolidate tools, because our client uses various systems. An example is Dynatrace as an internal banking application that they use for Layer 7 and agent-based monitoring on some of the servers and applications. And we still use TruView. Then they're constantly expanding to see where they can add something to fill in gaps. They're busy PoC-ing ThousandEyes to get some visibility on a different front. On the network side, we monitor the network components to clear that and make sure that it runs, or assist if there are notable response-time issues, to try and resolve where the issue would be located.
From our company, which is from the vendor side, we have about five or six users. In our client's organization we're expanding every now and then, but currently there are about 50 users, maybe more.
Because of COVID, everything is standing still currently. We started building grids and consolidated views to see what we can display on the centralized screens to improve visibility for Office 365, and those types of things. We would like to get that extra NOC-type of visibility, or an overview of the environment for certain sections. The client's strategy was that the more people that have access to the system, the more people will call us to inform us that there's something wrong in the system or in the environment, before that system even alerts us. The user base plays a big role in how the organization runs.
Which deployment model are you using for this solution?