What is our primary use case?
I am in what's called the "data explorers," which is our organization's free-form, "write your own database query with a GUI" to get some numbers out. I do that because I'm usually looking to solve very specific problems or to get very specific questions answered. I'm very familiar with the GUI and it does what I need it to do.
For our company, one of the major uses of it is in our sales organization. They run a lot of customer prospecting using it. Using the API stack, we ended up writing our own, internal sales tool webpage which does a lot of queries on the back-end to get info from the on-prem database.
We are using the on-prem deployment. We did use the SaaS version initially for the test, to see if it met our needs, but for production we decided to go for the on-prem deployment. The reason we went with the on-prem — and I'm not involved in the purchasing aspects — was because at the level of our flows and data rates, when you're doing the cloud solution you're also paying for the hardware. I believe it was determined that a one-time cost for us in buying the hardware, and then just doing the software license part, ends up being more cost-effective. I cannot speak to anyone else's particular pricing model or the amount of data they're sending. That may make it a very different equation. I have a feeling that the on-prem would really only work for the really large customers.
How has it helped my organization?
For our organization, the sales-prospecting is really invaluable. We had a previous tool that I wasn't really involved with but which was, to my understanding, very hard to use and which was — I won't say misdesigned — but designed strangely. With this tool I have been able to work with some of the front-end sales-developer people to tighten down the queries that they wanted to use to get the information out. Once they had that, they could go into their sales portal and put them in there. I can help them with the information because I know what it's coming from. I help them make queries: for example, "The customers in New York who are going to Chicago." Whatever that turns out to be, I know what it is. Whereas, with the other tool I didn't really know necessarily how it was working along its model.
We also have alerting from it for attacks and capacity utilization, which we didn't have before. The great thing about it is that it doesn't say, "Okay, this link overloaded," but it does what's called planning or trending. It says, "Hey, this IP usually has ten hosts talking to it. In the past hour, it has had 10,000 hosts talking to it." It will show things that might not necessarily be a situation where something is being overloaded, but which are still events that happened on the network and which we wouldn't have seen before at all.
Kentik has also helped to decrease our meantime to remediation in the case of attacks. We're able to pull out the IP that's being attacked and take action on it. Before we couldn't find that out easily. That process has gone from slow to fast. Attacks happen no matter what. We have a lot more visibility into them, we can see where they're coming from and that has definitely helped us take action against some of our customers who are continually launching attacks. Maybe it's decreased the number of attacks in that we have found out the customers who were doing them and terminated them. But the tool itself doesn't help us reduce the number.
What is most valuable?
Having the access to the flow. It gives me the ultimate exit type of stuff, which I wouldn't get in a basic flow-analysis engine. Also, I am able to do a lot of work on the visualization end to create different visualizations and different ways to get information out of it.
The real-time visibility across our network infrastructure is good.
The drill-down into detailed views of network activity helps to quickly pinpoint locations and causes. All the information is there. As an organization, we're still trying to figure out the best way to use it across all different skill levels. I worked with some of the sales developers to get a sales view. I'm working with the NOC to get a NOC view, because it is a very information-dense product. Someone who doesn't know what they're doing will easily get lost in it. But it does light up dashboards and views for people who aren't as skilled with it, to answer questions easily.
What needs improvement?
I would like to see them explore the area of cost analysis.
For how long have I used the solution?
We started with a trial just about two years ago and then we signed the contract for it at the end of 2017.
What do I think about the stability of the solution?
The stability is fine. It's a software product so they are going to be issues. There are problems that happen with it occasionally. They notice it, they send out a message, and it gets resolved. But there are no qualms at all about stability.
As far as the hardware goes, I can't speak to that. Hardware is as hardware does, but I presume they have enough stability or excess — spare capacity — in our cluster that I don't hear about anything. Every once in a while I'll hear that a fan died, a hard drive died, but there is no impact to the function of the platform.
What do I think about the scalability of the solution?
The scalability is great. We've had no issues with it. Our network is very large.
Obviously, you want to be a little — I don't know say cautious — but a little aware of what you're doing. It's the same thing as when you use a database. If you run a query: "Show me all zip codes starting with two," you're going to get a huge number. What you really meant is, "Show me all the zip codes starting in two in Maryland." That's a very different query and that will get you a much faster response because you're already only looking in Maryland. Without having someone to help guide you through that process and who knows what a database does, it's very easy to write bad queries.
One of the great things about this product is that it takes away that "middleman," that developer between the user of the tool and the raw database. At many companies, you have the database of customer information, for example. Then you have the users of that data who need it to make tickets and resolve issues. And in between them, there's a developer who figures out what the customer service people need to know: "Oh, you need to know all tickets of this customer in the past week." Or, "You need to know all the tickets that are open right now." The developer pre-writes those queries for them so they don't have to do it. What Kentik does is it eliminates that layer. I can slice data any way I want on the platform. But with that comes the caution that, if I write a query that is stupid, it's going to take a long time. There are ways to write queries which are smart and ways to write queries which are stupid. That's where it does take a little bit of time to understand how it works. Once I know how to do it, I can easily help other people make dashboard queries so that they don't need to know that.
How are customer service and technical support?
Tech support is very good. They have a form on their front-end where you can submit a problem request. The cool thing about is it is that it takes a snapshot of the query that's being made so they can immediately see what you're looking at. If you have a problem like, "Hey, why does this graph have this jump here?" they will see that right away, and then you can go back and forth with them. I've been working with them a lot on different issues and I've always had very good support from them.
Which solution did I use previously and why did I switch?
The previous tool we used was an internal module we developed. The previous solution was very sales-driven. It wasn't very good and it was not our main expertise. We had a programmer-and-a-half doing it. There were two parts of the problem. One part was data ingest at our scale. How do you ingest this much stuff? And the second was how do you visualize it? Those are both hard problems that are different from one another. Our skill set is not really that good in either one. It was easier and made more sense to outsource those aspects to people who do know how to do that.
How was the initial setup?
The initial setup was straightforward. We had the on-prem deployment, so they sent us a list of the stuff. I wasn't involved in the setup of the hardware, but our data center guy said it was straightforward. You put this rack here and plug this in. And as far as the computer equipment goes, the great thing about NetFlow is that it is a very standard industry protocol. It is what it is and it's pretty much done.
In terms of how to best utilize the information you have and what you know about your network, and to give it to the platform in a way that that is good, that is still very easy for a network like we have. But for someone who is a lot less rigorous about their internal typography or typology or descriptions or other meta-information, they may find it harder. You don't need to be doing best practices, just reasonable practices. If you're already doing reasonable stuff, it'll be okay. But if you don't have very good standards for your network in terms of descriptions and the like, you're going to have a bad day. But you were already going to have a bad day. It's not fair to knock the platform for that. There needs to be some way to get that meta-information into the platform, to be able to say: What's a customer? What's a peer? What's a core link? If you can't do that, then you have other problems.
We signed with Kentik at the end of 2017. There were a couple of months where we were spinning up the hardware, where we didn't really do any setup. They sent us a list and we did some due diligence to make sure that we had the right buys, etc. It's going to be different for an on-premise versus a cloud solution. But once we got up and running, things went very very quickly.
If you practice good practices in your network, it's very easy. If you have a very sloppy network with bad descriptions, where you can't write a rule that says a description and starts with "customer" is a customer, and a description that starts with "core" is a core, but they're all just "port to go" you're going to have a bad time. That's really work that needs to already be there in a good network. Our network was already designed with a standards base. So our setup was very fast. It took weeks if not days. Once we put the first few routers into the platform to make sure how the API was going, we were able to run all the rest through.
It took one to one-and-a-half people for the setup, excluding the on-prem hardware installation, which I wasn't a part of at all. I'm not a developer, so we had a developer who did the API work to add them into the platform. I guided that API work. It's really not that complex.
What about the implementation team?
We did not work with a third-party. You absolutely do not need that. This is not like HP OpenView or a Salesforce, where you need people who know the system to get going. The API is very easy to use. There's not a ton of Salesforce-level business logic in it.
Basically, it's a NetFlow collector, ingester, database, and a UI front-end to make reports. While it depends on how much in-house capability you have, most people should be able to do this without a problem.
What was our ROI?
I can't give you numbers. It's something which is very hard to quantify. I have no idea what the investment is, and how do you calculate the return. Is the return that a salesperson closed a deal that they wouldn't have before? I'm sure somebody could, but beyond "good," I wouldn't know what to tell you about ROI.
What other advice do I have?
It's a great product and the company is great. The company iterates and they move fast to add new things. When we did our first trials, almost two years ago, more than once, although not routinely, I would see a missing a filter set. You could filter on this element here, but you couldn't over there, and that's an error. I would put it in as a request and it would get resolved, sometimes in hours. The responsiveness is great.
It really takes a while to figure out the best way to use it for yourself. There is just a ton of information in there. Don't get dissuaded at first. They will help you work through it. You will need to understand your network, and you will understand your own network very well by the end of it.
The biggest thing Kentik has given us is the amount of visibility we have into our own network now and knowing what's going on at a given time. It's giving us the detailed NetFlow records and the ability to visualize them with different tables, with different timeframes, with different filters. It really provides a lot of detailed information on what's going on right now that we just didn't have. It may not have changed the business, but being able to know, "Hey this customer is always sending traffic to Spain and they stopped for two days. And then they started again." What's going on with that? The more information we have to give to our staff about what's going on in the network, the happier the customers are.
Things are moving in different directions at a global or industry level: Old-ops versus AI-ops versus DevOps, etc. We are not a very large company, so a lot of these other things are, to my mind, kind of "buzz-wordy" to an extent. I know that they're moving in that direction with V4, which is good, but for me I just want to know exactly what I'm putting in and what I'm getting out. A lot of solutions I've seen in the past that have been very "hand-wavy". They will tell you when they see trends, but they aren't really good at the stuff I need to do. I need to say, "Hey, this line is overloading. What do I need to do right now?" It's been really great to have that ability to go in, put down exactly what I want to look at and get the answers right away.
Of course, if you want to answer more complicated questions, there are limits to how easy you can make it. If you are asking a very complicated question, you need to know what you are doing. It's like any real intelligence or NetFlow analysis tool. It can get complicated if you're asking complicated questions. But the great thing about it is that it does expose all that functionality. So if you want to ask a very complicated question, you can do that.
In terms of the solution's months of historical data for forensic work, we don't have much call for that level of analysis. We do have the 60 or 90-day retention for the full data set, but we haven't had the need to go back that far for that resolution. That doesn't mean we won't. But as of right now, if we do have an issue with abuse, where we need to look at the full data set, we're doing that within a couple of weeks or even a week. So for us, it has not been a plus that we've had the whole data set for the 90 days, but that's what we decided to do when went with the on-prem.
We don't do any public cloud. We don't use it for any threat stuff like that. I could see an enterprise using it to be able to pull in stuff from Google Cloud or Amazon cloud — known routers. We don't do any of those kinds of things. We're trying to figure out the way for us to do it, to feed the list of customers who have bots on their network back to our abuse team to handle. We have that information available to us. We just need to figure out the right way to handle it as an organization.
We're a very small personnel organization and we don't deliver a lot of products. We just deliver one product. We don't do security or cloud.
I wouldn't say it has helped to improve our total network uptime, but we're not really using it for that purpose. Obviously in an attack we would see what happened, we can see traffic shift and take action based on that traffic. But I wouldn't call that actual downtime. Things got overloaded and now they're not overloaded.
In terms of maintenance, it runs by itself. For the on-prem, we bought the hardware and they monitor the servers because it's their own software running on it. We'll get a mail saying, "Hey, the fan on this is funny?" We can just swap it out. Beyond that, there really isn't maintenance per se.
I just say to the business units, "Hey this data's here. What do you want?" I sit down with them and figure out what they want and how often they want it. I then walk them through how to use it for themselves. One of the great things about it is that it's really just a front-end GUI to a database. But that's also a downside of it because it's really complicated. Someone who doesn't know anything about a database is going to have a hard time. But, if I sit with someone in our company and say, "What is it you want to know?" I can walk them through how to do it and, at the end of it, leave them with a dashboard and they can do it. It really depends on their own initiative and what they want to use it for.
The number of users in our organization is less than ten. The sales team is now starting to use it, but they're not really using the product itself. They're using our internal sales page which makes API calls to their back-end to get graphs. They're not really users in the sense that they're using the UI, they're making queries, they're making dashboards, or playing with all the parameters. They just have a constrained view that the sales development organization said, "This is what I want them to know. Give it to them." Those few hundred salespeople are using it, but they're just really consumers of the data that I, in consultation with the sale development people, said, "This is the data you're getting." Beyond that, there are a few in the NOC, people in abuse, people in planning, and me, who use it for different purposes.