If you were talking to someone whose organization is considering Kentik, what would you say?
How would you rate it and why? Any other tips or advice?
The biggest lesson in using Kentik is that as we continue to use it and learn more, we learn about the use cases that are valuable. Initially, when I came over to the team, we weren't using it to its fullest capabilities. As we started to understand the capabilities and dive in, in specific areas with Kentik engineers themselves for customer success, we learned that we needed to change our thought process a little bit; how we thought about flow logs and what they could provide insight into. My advice would be to leverage their customer success engineers upfront and don't let them go until you've hit all your use cases. Constantly be in touch with them to understand what some of the forward-thinking ideas are and what some of the cutting-edge use cases are that their other customers might be getting into. We don't make use of Kentik's ability to overlay multiple datasets, like orchestration, public cloud infrastructure, network paths, or threat data onto our existing data. That is something we're evaluating. We're currently talking with a couple of teams that are moving to AWS, teams that would like to use Kentik to potentially capture VPC flow logs and overlay that with their application performance data. That is something that is currently on-hold, pending some other priority work. We will probably dive back into that, with that team, around mid-2020. For maintenance, it requires less than one full-time engineer because it's a SaaS model. In terms of overall vendor partnership, I'd give Kentik a nine out of 10. They're right up there as one of my best partners to work with, amongst all the contracts that I own. They're very customer-centric. They're always available. There's nothing too small or too big that I can't ask them to help with, and they seem to be willing and able to jump in no matter what. That customer focus — which is a theme across the digital world right now with companies trying to try to do more of that — Kentik does a really good job of embodying that.
Go for it. The other solutions out there just don't compare. It has definitely been worth it for us. Anytime anyone asks us, we definitely recommend it. We were expecting to be able to see and understand more about our traffic. I don't think any of us thought we would rely on it as much as we now do. We have looked into making use of Kentik's ability to overlay multiple datasets onto our existing data and it's something we are thinking about. We're just not there yet within our organization. It gives us visibility into stuff going on in our network but I don't think it necessarily helps uptime. Where it could help uptime is for specific customers when it's DDoS-related. It helps us quickly determine what's going on with DDoS, where we couldn't have before. But for our network, as a whole, it just allows us to see what's going on. It doesn't do anything itself. It doesn't improve on the number of attacks that we need to defend. The internet is a wild place. With a network of our scale, there is something under attack literally every minute of every day, every day of the year. What it does is allow us to see quickly — immediately — is what is actually going on, and then take actions around that. I rate it a nine out of 10. We're happy with it.
My advice would depend on the network and what your use case is, but I would not underestimate the importance of how easy it is to use. If I were to sell this product to someone else, that's exactly what I would tell them: how easy it is to use. Easy tools get used. If you have a beast of a system where it takes 20 minutes to get the query out, then you're probably not going to use it as much. The biggest lesson I've learned from using Kentik is that when it's easy to drill down into data, you tend to do it more. We have spotted so many things that we would have never had spotted if this had been a less "real-time-ish" product. Collecting data is usually very simple, but presenting it in a good way such that people can actually access it and model it as they want, that's the tricky part. Having a tool that is as easy as Kentik is to work with, gives the team motivation to add more stuff to look at. We don't use its months of historical data for forensic work. We're using it as a real-time snapshot. You can buy the ability to go back further in time. With our license we only have the 30-day period but we rarely even look at 30 days. We usually look at a week to get the cycle of the traffic peaks that we have when people use our service on the weekends. That usually gives us a pretty good average for a month. Of course, we have other tools that we have built ourselves to do more long-term analysis, if we want to see how our traffic has grown. We also don't make use of Kentik's ability to overlay multiple datasets, at least today. We probably should look at more of these things. We only use it for traffic management or to get an understanding of our traffic flows from the private CDN. We don't look at any trap detection. We do have a very large Google Cloud installed base where we could potentially use that, but we haven't gotten around to doing it. We have eight people who look at Kentik. They're all working in content delivery. We don't expose it to managers or senior managers. Our structure is a bit different than some companies; we try to solve a problem very close to the problem. So it's basically my team that looks at it and they make the decisions. It's not like we have dashboards for managers and things like that. We do have the cost calculations, but we abstract that away by writing our own tooling to get the data out. It's just network engineers and the product managers for the content delivery network who look at it. I would rate Kentik a strong nine out of 10. There is always room for improvement here and there, but overall, for our use case, it's been working really well. We haven't had any real issues. I could imagine that if you have a bigger, more complex network, you could run into some issues, but we haven't. I like the fact that they come from the same background as we do and that they understand, at least from my perspective, the content part and what it's all about. They've been very easy to work with and very keen to listen to feedback. I am super-happy with the product.
Kentik has pretty good intuition, as a company, as to where the market sits and what they're into. They don't delude themselves. They really focus. They've been pretty good. I know the leadership over there and it seems like between Justin and Avi, they're good at what they do and that's why I'll continue to use them. Anywhere I go, I'm going to use Kentik if I have the chance.
It's a great product and the company is great. The company iterates and they move fast to add new things. When we did our first trials, almost two years ago, more than once, although not routinely, I would see a missing a filter set. You could filter on this element here, but you couldn't over there, and that's an error. I would put it in as a request and it would get resolved, sometimes in hours. The responsiveness is great. It really takes a while to figure out the best way to use it for yourself. There is just a ton of information in there. Don't get dissuaded at first. They will help you work through it. You will need to understand your network, and you will understand your own network very well by the end of it. The biggest thing Kentik has given us is the amount of visibility we have into our own network now and knowing what's going on at a given time. It's giving us the detailed NetFlow records and the ability to visualize them with different tables, with different timeframes, with different filters. It really provides a lot of detailed information on what's going on right now that we just didn't have. It may not have changed the business, but being able to know, "Hey this customer is always sending traffic to Spain and they stopped for two days. And then they started again." What's going on with that? The more information we have to give to our staff about what's going on in the network, the happier the customers are. Things are moving in different directions at a global or industry level: Old-ops versus AI-ops versus DevOps, etc. We are not a very large company, so a lot of these other things are, to my mind, kind of "buzz-wordy" to an extent. I know that they're moving in that direction with V4, which is good, but for me I just want to know exactly what I'm putting in and what I'm getting out. A lot of solutions I've seen in the past that have been very "hand-wavy". They will tell you when they see trends, but they aren't really good at the stuff I need to do. I need to say, "Hey, this line is overloading. What do I need to do right now?" It's been really great to have that ability to go in, put down exactly what I want to look at and get the answers right away. Of course, if you want to answer more complicated questions, there are limits to how easy you can make it. If you are asking a very complicated question, you need to know what you are doing. It's like any real intelligence or NetFlow analysis tool. It can get complicated if you're asking complicated questions. But the great thing about it is that it does expose all that functionality. So if you want to ask a very complicated question, you can do that. In terms of the solution's months of historical data for forensic work, we don't have much call for that level of analysis. We do have the 60 or 90-day retention for the full data set, but we haven't had the need to go back that far for that resolution. That doesn't mean we won't. But as of right now, if we do have an issue with abuse, where we need to look at the full data set, we're doing that within a couple of weeks or even a week. So for us, it has not been a plus that we've had the whole data set for the 90 days, but that's what we decided to do when went with the on-prem. We don't do any public cloud. We don't use it for any threat stuff like that. I could see an enterprise using it to be able to pull in stuff from Google Cloud or Amazon cloud — known routers. We don't do any of those kinds of things. We're trying to figure out the way for us to do it, to feed the list of customers who have bots on their network back to our abuse team to handle. We have that information available to us. We just need to figure out the right way to handle it as an organization. We're a very small personnel organization and we don't deliver a lot of products. We just deliver one product. We don't do security or cloud. I wouldn't say it has helped to improve our total network uptime, but we're not really using it for that purpose. Obviously in an attack we would see what happened, we can see traffic shift and take action based on that traffic. But I wouldn't call that actual downtime. Things got overloaded and now they're not overloaded. In terms of maintenance, it runs by itself. For the on-prem, we bought the hardware and they monitor the servers because it's their own software running on it. We'll get a mail saying, "Hey, the fan on this is funny?" We can just swap it out. Beyond that, there really isn't maintenance per se. I just say to the business units, "Hey this data's here. What do you want?" I sit down with them and figure out what they want and how often they want it. I then walk them through how to use it for themselves. One of the great things about it is that it's really just a front-end GUI to a database. But that's also a downside of it because it's really complicated. Someone who doesn't know anything about a database is going to have a hard time. But, if I sit with someone in our company and say, "What is it you want to know?" I can walk them through how to do it and, at the end of it, leave them with a dashboard and they can do it. It really depends on their own initiative and what they want to use it for. The number of users in our organization is less than ten. The sales team is now starting to use it, but they're not really using the product itself. They're using our internal sales page which makes API calls to their back-end to get graphs. They're not really users in the sense that they're using the UI, they're making queries, they're making dashboards, or playing with all the parameters. They just have a constrained view that the sales development organization said, "This is what I want them to know. Give it to them." Those few hundred salespeople are using it, but they're just really consumers of the data that I, in consultation with the sale development people, said, "This is the data you're getting." Beyond that, there are a few in the NOC, people in abuse, people in planning, and me, who use it for different purposes.
Rely on the customer service reps. That would be my biggest piece of advice because they've got all the good tips and tricks. The user base of Kentik in our company is very small, about 15 people or less. That includes our interconnection managers, peering managers, and capacity managers, as well as my small team of software developers. For deployment and maintenance of the solution it requires one person or less. Nobody needs to make it their full-time job, which is nice. Those responsibilities are spread across several people. One of the interconnection managers helps me, for example. Overall, I would rate Kentik at eight out of ten. The fact that we can lose data whenever we have traffic spikes, which our business does pretty regularly, is what would keep any solution from a ten, because I can't always, for every data point, say this is accurate. Occasionally there's an asterisk next to the data.
Carefully analyze your routers and how much flow they're sending to a collector. I would also suggest if you can minimize the number of routers that have to send BGP, so you have a good enough view of the BGP, but you don't have to have every router sitting at BGP sessions, that might help. Those are suggestions for implementation. The biggest lesson I have learned from using Kentik is "don't do it yourself." At my previous company they were being very stubborn and they didn't want to use an off-the-shelf product, so I went through three iterations of a netflow interface trying to get it correct, and I kept telling them, "Okay, but there's a product out there that does this. So please let's stop spending all this money." And they went so far as to spend a couple of million dollars on hardware to deploy it out to the network and everything, and we still ended up going to Kentik. That is one of the biggest things I learned, that sometimes you cannot do it all. You have to go to someone who's an expert in a particular kind of big data, and that's what they are. We don't currently make the use of solution's ability to overlay multiple data sets such as orchestration, public cloud infrastructure, network path, or threat data onto existing data. But with the public cloud providers we are working with, we are looking at pulling in VPC logs so that we can see if we're getting the performance that's necessary out of our public cloud providers. That's the next step with this product for us. We're not pulling in other data sources like logs or ThousandEyes data, for instance, at this point. We did talk to Kentik about trying to pull ThousandEyes data in and marrying it with their product. But not quite yet. I hope to add that into the product as well at some point. We do use BGP as another metric to figure out what's happening with the different paths. We probably have about 30 users. Everything from our monitoring team is in there so they're working with me on pulling together an interface that uses the API to pull the data out of Kentik to put it on one of our internal interfaces. That way, some people won't have to log in to get some data. It's more of an executive view for them. But some of our executives actually have access to Kentik too. We have a couple of network backbone engineering executives who have access and who do look at it. Then we have a lot of our operations team, the network architecture and backbone engineering. They all have access. It's a wide range. In terms of deployment and maintenance, there are two of us who put stuff in. I've created users. One thing we are going to do is automate getting the routers in there. We would generally suggest, and this is what I did previously, that you write scripts to do your updating of everything, plus you have the scripts that just does it automatically for you. That's super-helpful. In this environment we don't have that many routers in it. It's about 40 to 50 routers at the moment. We mainly use it on their engines. We're starting to work with our security team to get it from data center to data center as well. That's really limited by our need for security rather than how we would use it entirely. At my previous company, when I left, we had 667 routers in it. It was used everywhere for everything. We absolutely have plans to increase usage of Kentik at my current company. I'm working with our security team to get approval to do that. I have to meet their security needs in order to expand the usage. Honestly, it is one of those products that I would suggest to almost any network operator. I would go with a ten out of ten as my rating. I have not felt like this about any other company out there. It has just been so useful for me on so many different levels from operations, to ROI. It's just helpful.
I'm working for an organization with ~1000 employees and I'm exploring the two monitoring solutions for cloud services: Azure Monitor and PRTG Network Monitor.
Please share your personal experience on the pros and the cons of each product. What would be your choice and why?