What is our primary use case?
Our primary use case is troubleshooting. If a user starts to complain about something that's not working, we have a look into it.
It depends on whether it's already into the mirrors or not, because that's a problem at the moment. We have most of our servers on ESX machines at the moment, and each of these machines have dual 10-giga links. Putting them all together with a mirror onto only one 10-giga link makes it drop packets. I usually don't do that. I usually only put one or two of those servers into the mirror to avoid too much packet loss.
In the future we should go for a more distributed version, where we have Skylight on each ESX server so we don't lose any information at all. But at the moment, we have just one big appliance, called Site Extra Large, and we are running 5.1.3R1.
How has it helped my organization?
In terms of network troubleshooting, not so long ago a new company started (IT hosted on our location), and one of them was complaining about QoS trouble with VoIP telephony. They decided to go with a cloud telephony solution where I did have no say in it. I had to tell them, "I have no idea how this thing works. I have no visibility into any of the management of that solution." They were complaining about bad voice quality. Normally, I shouldn't have been able to know anything about it. But I thought I would just put Skylight on it for a moment to have a look at what was happening. I changed the mirror for a while so that Skylight could see the traffic, and it didn't take long to figure out that there were a few agencies that the company was using that had, at certain moments in time, some notable packet loss and some bad MOS scores on the VoIP part.
With that information, I was able to go to the company that gave us the SIP trunk and tell them about it. They changed some parameters, because there was something wrong on their side, and it was fixed.
Without that hard information it would have been pretty difficult to find things like that. Skylight is not used on a daily basis, but when it's used it usually helps to fix a problem pretty fast. I don't really have to look into multiple devices because it captures most of them. With that information I can usually say, "Okay, why is your server slow? Maybe you should not have all the thin clients asking every ten seconds about some server." If you have 1,000 thin clients, suddenly you have 100 requests every second.
That's what you can easily see with Skylight.
I have no idea how much time it has saved when it comes to response time, but I am comfortable saying "a lot."
In terms of helping improve interaction between our network, server, desktop and database teams, we're all in the same boat. It has helped me a lot when they have had serious issues. It helps me to say, "You came to me with an issue, I found this information. Which points to a specific problem at some team". It helps them, of course, in figuring out why something is slow/not working... It helps us saving time. We can look for a solution, not the problem.
In the past, we had some serious loads on a file cluster, which was mainly due to a few procedures doing some bad things, like passing through an entire directory and asking for every file in the directory tree one by one. While doing that, it continually opened and closed all the total transactions. So it was incredibly slow and incredibly heavy, because of one bad procedure. Skylight helped us a lot at that point in time, especially the server team, to figure out why the file cluster was slow. In the end it was a simple procedure creating havoc on the FILE Cluster. Hard to find when you can't see CIFS traffic.
What is most valuable?
One of the first things I'll do is look into bandwidth. With the bandwidth, I usually already have an idea if there is an issue or not. And I'll have a look into the errors of course.
If it's something related to HTTP or VoIP, then I can have a quick look into the protocols, a process which gives me some good ideas too.
The response times, with the performance, are really interesting too, where you can see the packet loss.
For how long have I used the solution?
More than five years.
What do I think about the stability of the solution?
We had some issues. One of them was with Nexus, but that was a missing feature. I don't think you can call that a stability issue.
We had one issue which was pretty annoying. The slave Secure Active was running at our new data center and, when we rebooted it, it just didn't work from the get-go. We needed to boot it again. I never really figured out what it was. Nowadays, it's not doing that anymore, so I guess they found the reason why it was happening.
We also had an issue with licensing when we did an upgrade: All the licensing was gone. We needed to request new licensing and they said, "We're very sorry." That took a few days but it wasn't really a stability issue either.
Once we lost all the information about the data. I didn't truly care about that because the data from the past is not that important for us. They felt really bad about it and they fixed it because we have never had that issue again.
So these are not really stability issues, and every time there was something they were spot on; maybe not immediately, but they fixed it.
What do I think about the scalability of the solution?
When it comes to scalability, there is never enough, of course. I can't put all my servers on it, but it's a lot better. It's an issue at the moment for us. We have now one big box and the changes are not a problem related to Skylight, it's more an issue for our company. When we bought new appliances, most of the servers were not really virtualized yet. Not much time later, we went for a totally new data center and a totally new way of working, where it was becoming mandatory to have it all virtualized.
Now, pretty much all of the servers are virtualized, which is great because if something is down you can just put in on another server, vMotion, and it works fine. But if you don't have Skylight running on each of those servers or some mirror on each of those servers, then you are not capable of capturing all the traffic because the server can at any time change to another server. It's a lot harder at the moment. We have to juggle things with the other teams and make sure we're all on the same page, that they don't move the server at that point in time.
So scalability, at the moment, is pretty bad for us now, because our modus operandi have totally shifted. We now need a more distributed solution, which is something we'll probably look into the future. We are going to need to buy the licensing to get it running on each of those servers, which would be the ideal solution. Another solution could probably be going for a 40-gig port, and maybe an even bigger appliance. But I wouldn't really like it because it would still mean that we would lose quite a large amount of information, such as from what server it is actually coming at the moment.
How are customer service and technical support?
All the things I wanted, they put in. That's what I like about them. They did listen to me. I got a modification for the VoIP. They changed the handling of the system because we had some Nexus 5648Q switches here, which seemed to be capturing traffic in a different way, which made it unreadable for a while. But a few months later there was a version that was capable of reading all of it.
They do listen. I hope that remains when they are absorbed by this new company, Skylight. I was pretty happy with the interaction with them.
Tech support is good. They usually respond within a day. They are highly knowledgeable, and mostly it's not coming from some theoretical stuff, it's real-life knowledge. They know how their solution works in real life, not just how it should work. I rate them really highly in support.
Every guy I have had until now at Secure Active, I got the impression that I wasn't only talking to some support guy who only does support. I got the impression that most of those guys who do support do programming on a regular basis, and that some of the lead programmers sometimes do support. They're really highly knowledgeable. I don't think you could get that kind of knowledge without really knowing how it works. It's really impressive. It's definitely not Cisco's standard. On Cisco's standard you're trying to get out as fast as you can from the first level.
Which solution did I use previously and why did I switch?
We didn't have a solution seven years ago, before this one. I know that determining problems used to be difficult. The reason why we bought Skylight, which, in the past, I knew as Secure Active, was that at that point in time we had some serious issues with printers at locations where the bandwidth usage was really high. We couldn't figure out what was happening, which jobs were the issue. With Skylight we found the exact times when printer calls were made. We saw from whom and where to; from all the sites. Something was sent to that printer and then this printer. It was just a matter of looking into the printer information and finding out what kind of print job it was. We found out that PostScript was s badly configured on some spoolers, so we needed to change the PostScript driver/configuration.
But before we had SkyLIGHT, we had months of issues and there was always a time crunch involved in figuring out why some agencies were slow. We saw peaks but we couldn't figure out why there were some serious peaks. That's when we figured out the PostScript drivers were badly configured, simply because some of printing jobs were 50 megabytes or 100 megabytes for only two or three pages. There was something terribly wrong with drivers/configuration.
How was the initial setup?
It's hard for me to say anything about the initial setup because we did it seven years ago. At that point in time, I had some help from them. But it was, in my opinion, pretty easy. This solution is for hardware network or data center guys. It's not for a simple user who is not going to understand anything about it, at first glance anyway. But for the people it's made for, it should be easy for them to figure out what's going on. Putting in capture ports is easy, defining zones, defining VLANS, etc. is easy. As soon as you have done that then you already have something functional, and you get a lot of information out of it.
Even starting with a reasonably simple configuration is going to make a huge difference when you start using it for troubleshooting - and you are going to find issues. The question is not whether you will find issues. The question is, do you have time to figure out these issues, and are they your issues? You can fix your issues and send issues that are not network related to the other teams, and let them figure them out, if they have time.
The length of time for deployment depends. The last deployment of the new system was pretty much a copy-paste operation. It took less than one day. In the beginning it took me about a week, figuring out configuration.
That's another thing that's interesting. You should better think in advance about how you're going to configure all your zones. Zone configuration is interesting for the matrix, and if you don't put your configuration into an easily readable setup, it's going to be hard to get interesting information about that matrix. And that matrix is actually quite interesting to have. It's probably very slow as well because, if you're like me, you have an enormous number of zones, a few hundred zones. It's not so easy to show a matrix with a few hundred zones on the browser. So zone configuration is pretty important. The way I do it is, I have my internet, I have my internal LAN, which I split into voice and into data. I split my data into locations, etc.
I can easily see the difference: Is it voice/data, is it client/server, internet/local what is the location...? If you don't do that correctly with sub-zones or the like, it's going to be way harder to figure things out.
It has always been me who is the only one who does deployments. My colleagues use Skylight also, but they are more into looking up some stuff. They don't really configure it.
Maintenance is also just me but it's pretty easy. If you want a new version, you go to the website. The hardest part is finding the link, where is that .bin file? Sometimes it's pretty hidden in a document. They could put it more easily in Salesforce, because now it's hidden in the release notes or in another file somewhere. And it's usually not on the first page either. Somewhere in the last pages you usually find something like, "Here's the location where you can download the file." I'm not saying it's hard. I'm just saying it's the hardest part. In a way, it's a good thing because you do have to read the release notes.
As soon as you find it, you download it and upload it to the machine, wait for it to say it's done, and then you need to reboot. Ten to 15 minutes later, it's working.
What was our ROI?
Our organization has saved money by using Skylight. While I can't say how much, it has probably saved us a few servers that we would have bought simply because we wouldn't have known of some badly configured procedures.
We've seen ROI in terms of time. Time is money.
What's my experience with pricing, setup cost, and licensing?
If you don't look into licensing, Riverbed and SolarWinds are pretty comparable. But if you look into licensing it would not be smart to go for either of them. On the pure, bare-metal basis, it's the same. But when you get the bare metal and a few basic licenses, then you need all those other licenses just to be sure that there's no issue.
They're going to tell you, "Yeah, but you only need to buy those that you need." How do I know I need them? You're going to have to know you have a particular issue. How do I know I have that issue? Or you can just wait until your users start complaining that you really have a VoIP issue. Then you should really buy the license for that issue, because there must be a voice issue.
Which other solutions did I evaluate?
We did look into alternatives a few years ago, from Riverbed and SolarWinds. You have to look at competitors. If you don't do that you're not really doing your job correctly. I wasn't really happy about doing that because I was happy with this solution, I didn't want to change. But we wanted to see what the competition had. We looked into them and realized, "Why would we want to change?" We're going to have more problems for more money, and we're going to have licensing issues, etc.
If you look into Riverbed, it's a licensing nightmare. You need to pay for every type of analysis. If there's one thing you don't want to do, it's that. When you're troubleshooting what do you need? You don't know. You've got a problem. A user complains something isn't working. You don't know why it's not working, that's why you have this solution. But then you click on a thing and it says you don't have this license. Great. So you buy the license, you can click on it, and you find the problem wasn't there. Click on something else - you don't have this license. You're going to buy stuff that you're not going to be needing, but you don't know you won't need it.
One of the great things about Skylight is you have them all, and you actually need them all, not because you have certain issues, but just to know you don't have issues with it. Just click once and see, "Okay, this looks fine. Next."
What other advice do I have?
Put some thought into how you want to organize your zone information before you start. Play around with zones at the beginning but don't keep that setup as something you want to really use. As soon as you have an idea of how it works, put some thought into how you really want to go with it in the future. Then reorganize it so that it works the way you really want it to be, following that structure. Then start finding some issues. And trust the other teams that they'll start do their things as well, so that you start to get a clean network. Having a clean network makes it way easier later on to find issues. I thought I had a clean network. I didn't.
If you already have ten or 20 issues that are all doing bad stuff, it's going to be way harder to find out why this new thing created new issues, because you already have a lot of issues. Cleaning up the issues after you've used Skylight is pretty important - and you're going to find issues.
I don't think it helps me with minimizing downtime. If the network is not working, Skylight is not going to help either because it looks into the traffic from the network. If the network is not working, I'm not getting much information.
We don't use Skylight for performance and traffic monitoring of cloud environments at the moment. We are not using any cloud solution at the moment on a production basis. It's only in some test cases. It might be possible in the future, and then Skylight might be quite handy. I have no idea how it works, but I saw that Skylight has some new stuff about cloud.
What's interesting also about Skylight is that it's not going to show you the issue that may be the cause of your troubles at that point in time. It's going to show you all the issues that could be the cause. Usually there are four, five, or six issues that you didn't notice simply because you had enough capacity, so the service didn't have any issues. And then you'll get a seventh issue and that seventh issue does start to create a problem. You're going to look for that seventh issue, not for the first six issues that were there too, when you still had enough capacity to make it work.
It's not going to show one issue. It's going to show them all, which is a great thing and a bad thing. It's going to be clear that you don't have one issue, you have multiple issues. Of course, when you fix them all, you're going to be really happy and you'll have a much better network. If you want to do it right, it's going to take you a while to fix your issues on the network. You don't have to. You can always say, "I don't care about these and those issues. They're just some side things and they're not really important." If you don't have the time, don't fix them. Just keep in mind that you still have those issues.
In terms of direct users, it's just me and my two colleagues. All the users at my company total around 1,000-plus people. They get use out of it because the network keeps on working.
It's not really used heavily at the moment. We use it when there are issues. Most of the issues have been resolved. It has become a troubleshooting tool now. When there's an issue, we look into it. And that's when it shines.
I could use it for other things also, but I don't. We're not really looking into it for that either. For me, it's just a Swiss Army knife for a lot of things. Something I have next to my other tools. I use a latency monitor. I use a bandwidth monitor. I use Netdisco, which is an inventory manager for all MAC addresses. I have Skylight. I have five or six different kinds of network tools, and Skylight is one of those specialized tools that I can use for a lot of stuff at the same time.