Corvil Review

Gives us a complete, real-time view of latency in our platform


What is our primary use case?

We use it primarily for latency monitoring and capturing latency statistics.

How has it helped my organization?

The product that we offer to our clients internally is a low-latency platform. It's a trading platform and the primary pitch for the product is low latency. We have clients who want to get to the market as fast as they can and that's how we have designed our product and that's what we sell. The way we constantly reevaluate that our product is the best in the market and how it is doing against competitors is by measuring latency. Corvil has done exactly that throughout our product evolution in the last three or four years. I've been using it for two years, but the Corvil installation has been with our firm for a while now, four-plus years.

This information that constantly comes from Corvil is what has helped us to evolve our product in terms of latency. That's primarily how Corvil has helped our product.

In the last year-and-a-half we've also extracted this information and presented it in different ways so we can actually pinpoint, at various points of the day, where we see higher latency or a hit to our median latency. We then want to know, when the latency deviates from the median latency, what has caused it? Our application is Java-based. There's something in Java called "garbage collection." As soon as the garbage collection hits, there's a higher latency and we can actually pinpoint these points from the Corvil data because Corvil has statistics for each individual order throughout the day. So we can look at a certain time of day and get the metrics for that order. We can then go to our application logs and see what was happening at that point and confirm that the issue was Java garbage collection.

We had a client who was concerned with outliers. There is the median latency and, at various points in time, we find outliers which jump away from the median. We can use Corvil data to identify that it was the garbage collection points where the client was seeing pretty high latencies compared to the median. We were then able to redesign our product, make some changes to the JVM parameters, and make the product perform better. Doing so made the client happy.

In terms of the product's performance analysis for our electronic trading environment, as I've said, our application is a low-latency application. We have our own internally developed proprietary system that runs on our own servers. Corvil is sniffing for traffic that comes from the client to our process and through to the exchange and then it's calculating latency between these different endpoints: coming from the client to our servers, from our servers to the exchange, and it's giving us latency. Apart from the network latency, the area where we want more control is inside our process. As I already mentioned, it has helped us improve our product.

The solution helps to correlate individual client or trade-desk transactions to infrastructure and venue latency. As long as the client message or the trader message has the relevant information, you decode it and you have that information. So we have statistics based on client. We have statistics based on exchange. With the new introduction of this Intelligence Hub, which we are still reviewing, you can break it down by any number of parameters, like symbol, or site, etc.

We have all that information. In fact, we were able to break down the information by client because we have a particular instance of our process that just handles client traffic. We get multiple clients sending into the same process and we can see, by visualization, that there's one client who seems to have better performance, another client who seems to have a little degraded performance. Why is that the case? We were able to drill down into it and we saw that it looked like the client who was doing better was trading on a different market which is a particular protocol, a FIX protocol, for example. The other client who's trading on a different market could be on an OUCH protocol. And his trading behavior is different. All of these patterns of trading from different clients on different markets can be drilled down into. In this scenario, it helped us to look at the protocol implementation of our process and see if we could make improvements to our protocol. It actually did help us and we found that there was an issue with one of our protocol implementations. We had to go and fix it.

When you break it down by client or by market you can see which client is performing better, which market is doing better. Then you can drill down into that and see the trading pattern of that client: Why he is doing well, why the other guy seems a little off.

We have also seen increased productivity from using this solution. If I had to go figure out the latency and then see where the problem is, I would have to do a lot more analysis from my own logs, but that wouldn't be as reliable. If I'm capturing anything in my process then I'm adding a latency on top of my processing, as well as disk latency, network latency, etc. Having a source outside of my process telling me how my process is doing is way better than just doing everything from my process. It has definitely helped us improve a lot of things including our productivity.

What is most valuable?

The latency measurements are the most valuable feature. It has the entire view of our platform, as we would see in our own internal processes. It has all the decoders so it's capturing every network packet and it's decoding in real-time and it's giving us latency information in real-time. It is easier to identify the flow and get quotes whenever we want. It's the real-time decoding and getting the latency information statistics that we find the most useful. Absolutely.

The analytics features are something that we are starting to dig into it. Until now, we have only tapped into the latency part of the device. The analytics part is pretty good in terms of what it's already generating. You can get some cool visualizations of the data. The analytics part of the Corvil is pretty cool and you can get a lot out of it. But we also want to use the same data in our own internal processes. It's something that we want to tap into. We haven't used the full potential of Corvil, yet. They're also coming out with a new product called Intelligence Hub, which we are starting to review. It will give more analytics from the Corvil data.

Finally, Corvil provides these nice "APIs" so we can download the data as a comma separated file and do our own analysis. We've been looking at that for the last year-and-a-half: How to use that data and put it into a big-data platform and use that data for other, even more advanced visualizations and analyses.

What needs improvement?

Before I got the Corvil training I got information from other colleagues who had already used it. One thing that was not very efficient was that every time you had to create a new stream or a new session from within Corvil - if you wanted to capture new traffic that's going through - you had to tell it what protocol the message is going to come through and how to correlate messages, etc. That was not as efficient as I would have liked. 

After I went for the training, they had already added these nice features in the 9.4 version where it could do auto-discovery. That was a pretty cool feature. Based on the traffic that it has already seen, it could create sessions on the fly.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

The stability is pretty impressive. I have heard that the newest offering of the server is even more awesome. We don't have the newest server yet. Our servers are pretty old, as in a few years old.

We have a couple of flows that are taking a little hit, meaning there is so much traffic coming in that the servers drop packets. But again, that's an old server. We've already spoken to Corvil about it and Corvil has said the solution would be to limit the amount of traffic or get the newest servers that they have. Those have bigger capacity, bigger hard disks, bigger memory, etc.

But the other business offering that we use Corvil for, which I personally support, has no problem at all. There are no dropped packets, it's completely reliable. I wrote what we call our "end of day" process. It takes the data from Corvil and the data from our process and tries to reconcile them, to see if all the data that my bosses get is actually in Corvil. It always comes out 100 percent clean, which means all data is being captured. So the reliability is very high.

What do I think about the scalability of the solution?

For our business, scalability has not been a problem. We don't add sessions every other day or new clients every other day. We don't add a new exchange every other day. We have not reached the capacity in any way. 

Scalability would not be an issue. But Corvil has mentioned that they have better devices now that can handle three times more traffic.

How are customer service and technical support?

Corvil's technical support staff are very amazing. They're very responsive. They come back to us immediately. For example, Corvil provides something called the analytic stream, which is something that you can use in real-time. As I said, I wrote a process which does an end-of-day download of the data into a CSV and then I do more visualizations via CDF. We are trying to see if we can do the same thing in real-time. Mine runs at 5:30pm after the markets are closed. This one is more real-time, so we get the data as Corvil is decoding.

We emailed Corvil support, myself and a couple of other developers, and said we need a connector API, because they provide APIs for different sources. We wanted to use Apache Kafka, which is an open-source platform. We said we need a connector for that. They have pre-built, pre-developed APIs. We said we need an account as well. I did not have a developer account with them. They immediately created an account and put the packet in there, and it was good to go. This all happened within one hour.

They also have a dedicated sales person at Corvil and a dedicated business/technical person who comes and trains us and who can also come and do setups for us. They visit regularly. They come here to check how we're doing and if we need help.

Their support is awesome. I'm very impressed with them. I've been in this industry for ten-plus years and I don't know another vendor who works like this.

What other advice do I have?

My advice is "Go for it." It's an amazing product. Data is power. In the new data era, data is everything. Data is power. Data is knowledge. Corvil does exactly that. It captures data and does a lot of good stuff with the data. I think it's a must-have for any company, not just a company that cares about low-latency. You can capture any kind of traffic and you can do amazing things with the data. It can be used for many things like keeping servers' time in sync, capturing the data at the right time.

I went for training with Corvil about three or four months ago and I got my certification. Our company is now starting to use the full potential of Corvil. When I went for the training I realized that we weren't using the full potential of Corvil yet. We have a very limited installation of Corvil, as far as I understand. We primarily use it for latency. Corvil is not just for that. You can do many other things.

I personally wanted to get the certification because I was so impressed with the product. I wanted to get certification to learn more about the product. It has this amazing view of the entire platform. I have my process, I get messages, I decode, I know what's happening. But Corvil is such a powerful device that it can see the entire network trafficking. It can see everything that's going through your network.

The data can be downloaded as a CSV file. We use that in our own internal application and we visualize it in a different way. One of them that we use extensively is called a CDF graph: accumulated distributed function. If you visualize the same data in terms of percentile distribution - this is a CDF distribution and it's different from a normal distribution - you can see that up to the 95th percentile, the latency seems to be in line with the median. But from 95 to 99 it seems to be pretty bad. And above the 99th it is very bad. Things like that just help visualize data in a different way. Intelligence Hub is going to be able to do all of that, as far as I understand.

For us, Corvil is a supplementary tool. It's not used for major business decisions like improving order routing. One of the things we do to make business decisions is look at the amount of flow. Our process itself captures stuff like that. It's not latency data but the amount of flow, and the rate at which the flow comes. Corvil is able to capture all of that, the number of orders, the number of canceled replaces, and the rate such as 100 messages per second or 1000 messages per second. You can see all of these breakdowns in Corvil itself. We make decisions on our capacity and the like from Corvil. Other than that, I don't believe we would use it for any other business decisions.

The deficiencies that we felt before, regarding maintenance being a little bit difficult are completely gone now. We have started looking into Corvil a bit more and we have dedicated people maintaining it. That was not the case previously. There is now a lot more attention on this product and the maintenance has become easier with the newer software.

In our company there are three users of Corvil who are all developers, and there are three users who provide support for Corvil and for our business. We also have three or four business users who get reports. Those users are pretty technical too, because our business is a low-latency platform. Everyone is technical, in other words. Even the business people are tech savvy. They understand technology. So they use Corvil as well.

In terms of increasing our usage of Corvil, as I said, we are looking at this new offering, the Intelligence Hub. Corvil is an amazing device. When I was doing the training, the trainer mentioned that there are Financial companies with 200-plus servers. I was so shocked. They use it for everything, like phone call monitoring and web traffic monitoring - everything. We use it so little that I was a bit shocked and realized, "Okay, this has so much potential we don't seem to be using". It has changed the mindset a little bit. I've explained it to the business, which knew this from the beginning. They were the ones who primarily installed this. I see our use going up in the next few years. I don't know what the pace will be, but there is a lot more focus on it now. As is with any bank, things don't happen overnight. It will take time and it will grow.

I give the product a ten out of ten because I just love the product.

Disclosure: IT Central Station contacted the reviewer to collect the review and to validate authenticity. The reviewer was referred by the vendor, but the review is not subject to editing or approval by the vendor.
Add a Comment
Guest
Sign Up with Email