What is our primary use case?
We use it for backup and recovery; system protection is an even a better way to describe the use case. We not only have a backup of our data but it also provides disaster recovery. While having your data is important, having the ability to return to production, within minutes of an issue — which means standing up the whole VM at a point in time — is way more important in today's world than it is to just have a copy of your data.
It's protecting both physical and virtual environments. It protects Windows, Linux — multiple flavors — as well as Microsoft SQL and Oracle Databases. We have two bricks and we're protecting about 175 machines, both physical and virtual. It has been about 98 percent VMs, and probably even higher than that. We're a VM shop.
We are protecting all of our SQL databases with the native SQL tools within Rubrik, through which we can do all of our table restores at a point in time, depending on the database itself. We are using multiple different archives with multiple different SLAs, both on-premise and to AWS.
Our deployment of the solution is primarily on-prem.
How has it helped my organization?
The SLA-based policy has had a positive effect on our data protection operations. I'm going to be going even deeper into the automation part, to use some of the newer features that have just come out in this release. It's going to be great to be able to just tag a machine in Virtual Center and its backups will be taken care of. That will help our process in terms of protecting machines that need to be protected and it will remove a step that people don't necessarily remember to do.
In the 5.0 release, they added the ability to back up Oracle Databases, natively, similar to how SQL servers are done, and that's going to be a big win for us. Hopefully it will reduce our storage size because we back up many databases that have a lot of the same data in them. Today we can only do it as a big blob so we don't get any space savings in that respect.
In terms of recovery time, it saves us days. The last time that we had a major system go belly-up, it was three or four days before we had the system back to being functional. In contrast, I was able to return a system that was being a pain due to some vendor-patching, multiple times, back to a known state, and within minutes. Granted, they weren't the same systems, but it would have been virtually the same thing if I had been able to do that with the major system that was down for days. Fortunately it doesn't happen that often, but in that particular patching case where I was using it, it felt like I did it about ten times. The vendor-patch was not going well, so I had to keep bringing it back multiple times, in a very short amount of time. But typically it has not been that much of an issue.
In addition, with the multi-tenancy feature that they added, back in one of the later 4-trains, we've been able to provide another team within our organization access to be able to manage their own backups, and only see their servers. They are able to only touch and change stuff for their owns systems. In theory, that also gives them the ability to do their own restarts if they ever need to. Our previous system had really no way to handle that, so it's been pretty fantastic.
Overall, I would say Rubrik has saved us a lot of time managing backups. I used to spend a minimum of about 50 percent of my time doing just nursing our backup system. Now, I might spend one percent of my time a week looking at the backups. There's not much that I need to do, other than just keep an eye on the system to make sure that nothing crazy has gone on. But I spend virtually no time, at this point, dealing with the backup system to make sure that it's still running. It's been a big help. Since I'm not spending as much time dealing with the backups or doing any sort of recovery, we have been able to actually work on other projects and other needs of the organization.
It has also helped to reduce downtime. We had one production server that went down and we were able to get it back up in just a couple minutes. In comparison, if we had needed to rebuild that entire server, that would have taken days, and possibly longer, due to needing to reload the applications. That is not, sometimes, a trivial matter.
What is most valuable?
The database backups, where you can go to a point in time, are huge.
The instant-on recovery is another huge bonus to the system. It lets you get a system back up and running within minutes if you need to, instead of having to try restoring it all out to your primary storage. That becomes a huge deal when you have a system that's down and people want it back up as soon as possible.
The archiving, off-of-box, is awesome. It lets you put your data where you want it and gives you the peace of mind of having more than one copy of it. And it's smart about the way that it does the archiving. It doesn't just copy one-for-one. It does all of its processing of the deduplication and compression before it sends it off to the archive, which helps with our cloud costs. Before, we weren't doing anything to the public clouds. But the amount of storage that we're actually storing in AWS is a lot smaller than what it would have been if we had just done a normal copy-out.
Rubrik's web interface is fantastic. I can get to it from pretty much any device. It's responsive, it's simple, it's clean, and it's easy to find stuff. One of our main goals when we picked the solution was that it would be something that was simple to use; that someone could do so without having to go to like a lot of training. In an emergency, if someone else needed to log in and figure out how to do something, they most likely could do it through the web interface. It's definitely user-friendly.
What needs improvement?
There are some improvements that could be made to the web portal itself to make life easier. It comes down to the usability, to being able to use the system wherever you are. While it's pretty user-friendly, there are little quirks to it that could easily be changed.
Also, the deployment and configuration of the backup service is something that could be streamlined a little bit, particularly when you're trying to do a SQL workload. You have to install a backup service on the server. You only have to do it once and then you're done, but you have to do that on every server that you want to protect. We are backing up about 170 servers at the moment. There isn't an onerous number of tasks, but there are some things that you have to remember to do. And if you haven't done it at all, or not in a long time, you may or may not know to do them. I would think that, like in the installation wizard, they should be able to step you through that type of stuff, or at least give you a reminder. It's something simple but something that could be improved.
For how long have I used the solution?
We have been using it for just about two years.
What do I think about the stability of the solution?
It's been very stable. We've not had any issues with the system. It has performed well since day one and we're on our fourth or fifth different code line.
What do I think about the scalability of the solution?
Scalability is pretty simple. We had initially started off with our two bricks in a replication pair, and then we needed to bring that replicated pair into the main system. I worked with support, decomm'ed the replication target, got that brick reset, and then brought it into the cluster. That took just a couple hours, but that included the fact that I had to physically move the box. But it was extremely simple and, once it was in, it operated just as you would have expected. All of the certificates copied over and I was able to contact all of the nodes exactly how I would've expected. It was pretty seamless.
Performance-wise, we might be using five or ten percent of the performance that's available through the system. After that initial ingest, you're only really copying changes, and most of our changes are relatively small in comparison to what the system can actually handle.
In terms of features, we're only using five or ten percent of the features that are in the system. I was working on using some other features and then the need went away. It was taking a snapshot of a database and from one server and restoring it onto another server, but the need went away so I stopped working on that.
As new things come out, they move us forward. They just released a feature for the archives and cleaning them up. I must've missed it in one of the release notes, so when I ran across it I said to myself, "Oh, I better go in and enable this." Low and behold, it did exactly what we needed it to do and it saved us double digits of terabytes on our archive locations, which was great because we were running out of space. When they added the ability to link VM's between virtual centers, I enabled that one. As new features are released I'll implement them. There are quite a number of features, such as all of the integration with NetApp and Pure Storage, which I can't use because I don't have that storage. I can only use the features that make sense for us.
How are customer service and technical support?
Tech support has been fantastic. They will bend over backward to help get solutions. The biggest thing that we use them for is to do the upgrades to the software. Since they have global support people, I'm not having to either patch a system in the middle of the day or having to change our backup windows. They have someone available after our backup window ends but before the beginning of our business day. It's not in the middle of the night for them either, they're coming in at their normal time. It's been great. Plus, on the human side, they're not forcing people to work a third shift to support us on the other side of the world. They give someone a normal shift and make the support experience positive.
Which solution did I use previously and why did I switch?
We had been using ARCserve for about 18 years before we switched. Sometimes it's referred to as CA ARCserve and sometimes just ARCserve. It went through a couple of different incarnations. It got spun off at one point, so it's a hit or miss as to what it's known as.
We decided to switch because our system was way out of date, and in terms of performance, our backups were taking so long that we couldn't actually complete them. The restore time was abysmal. It took days to restore if we needed a large chunk of data. The maintenance of it, in terms of the human capital, was intense. As I said, I was using at least 50 percent of my time per week just trying to make sure that the backups completed, as much as they could, for that week. We were starting to run into the scale issue, where we couldn't back up our data and export it off to tape within any amount of time that was reasonable.
We were also way out of space. One of the biggest management issues was that I had to keep moving stuff around. I had to arrange things such that, "That job has got to go over here because there's enough room for it. And this job has to go over here because there's enough room for that one." We did a project and we came across Rubrik and it was the best decision that we've made.
How was the initial setup?
The initial setup was very straightforward. It was the easiest setup that we had on all of the systems that we had looked at before we bought them. When we moved from our PoC to production, we actually handled the setup of the second brick when it came in. We didn't even need to engage their field engineers to help us.
There were two of us involved, me and a colleague who is the senior network engineer. The deployment took about four hours. We actually redeployed both of them, the whole system, within four hours. We tore up the old PoC stuff, refreshed it all, and then started over with it because some stuff had changed and we needed to restart it. We did the whole system within about four hours.
In terms of implementation strategy, we cut over from our old system as fast as we could. We started with our large and most important system. We let that sit there and bake and perform its initial backup. Once that was done, we started porting every machine over that we could. It was great with the way that the system worked. We just went through our list of systems that we needed to move. He started at the top and I started at the bottom and we just checked them off, made sure that we got them all in. We then stopped all of the legacy jobs on the previous system and we were up and running on the new system within less than a week.
What was our ROI?
The biggest ROI is a lot of hidden costs. With the lower amount of management time, I've been able to focus on doing a whole lot of other work. Nobody has done a full ROI comparison, but just in my time savings it's been huge. I've not needed to do a whole lot of work on the Rubrik system itself.
What's my experience with pricing, setup cost, and licensing?
We pay yearly and it's based on the number of bricks. Each brick has a set cost, which I don't know off the top of my head. I don't handle the money side of things.
We have not had any other cost from them since we did the initial purchase. The only other thing that I know you can even buy are some of the connectors to the cloud: cloud-on and cloud-out. But we're currently not using them.
Which other solutions did I evaluate?
Our project started off with eight vendors. We whittled it down and PoC'ed four of them and ultimately chose Rubrik.
The ones we focused on were a Veeam/ExaGrid combo, and Cohesity was another one. We also looked at the newer product from ARCserve, their UDP product.
- The main difference was simplicity. Rubrik was heads-and-shoulders above the rest of them in terms of ease of use.
- There was also the installation of the system and the infrastructure to run the system. Rubrik was head-and-shoulders above the other three.
- Performance-wise, in terms of raw numbers, Rubrik was not the highest performant one, but that's also due to the way they value the systems in production. They don't try to stun the workloads while they're trying to back them up. You can work with support and change that, but that really only comes into play on your first ingest. After that, they were as performant as some of the other ones and way better than some of them.
- The last thing was that what they said they did — the features they had and what they said would happen — actually did happen.
When we were evaluating the agent, or as Rubrik calls it, the backup service, theirs actually worked. One of their competitors' agents did not work and we were told that it was our fault that it didn't work, and for it to work we would have had to rebuild all of our Linux systems to meet their recommendations or specifications. That was a huge negative on their side, but a very big positive on the Rubrik side.
What other advice do I have?
Look at what your SQL database is. If you're doing the industry standard of dump and sweep, migrate off of that as fast as you can. Get to the point where you're doing the native Rubrik backup for your databases as fast as possible. The industry-standard way can kill how much you can store on your systems, very quickly. That, in and of itself, is one of the biggest things that we learned the hard way. We thought we had a lot of time to move off and it bit us pretty hard for a period of time.
Another big lesson I've learned from using the solution is that you should use the system the way it wants to be used. There's a big mind change that you have to go through, to understand the way that the system works, depending on what you are coming from. We thought we had a good grasp of what we were actually backing up. But it turned out that there was a lot of hidden data growth that we were not expecting. That was mainly due to the fact that we had no good way of getting that information out of our previous system. If I knew everything I knew today, back when we were purchasing it, I would have bought more. But that comes with the territory of 20/20 hindsight. And having bad data, there's only so much that you can do.
Rubrik's Polaris, the SaaS-based framework for extracting metadata, sounds very interesting. We've not gone down that route at this point, but it's something that we'll be taking a look at within the next year or so.
In terms of maintenance of the system, it's pretty much just me. I'm the only one who really maintains it and, as I said, I might spend about one percent of my week dealing with the backups. It's very low maintenance.
Rubrik is a ten out of ten for sure, hands-down. They've been great. It's been one of the best engagements with a company that I've ever had.
Which version of this solution are you currently using?