What is our primary use case?
We are a financial company and we have redundant data centers, with a VMware Metro Cluster staged between the two locations. We have Rubrik running in our data center and it is used for backing up our on-premises infrastructure.
We keep the backup of the environment on-premises for two weeks, just to be able to restore in case we lose or corrupt part of the virtual infrastructure. We also send copies of some of the data into the cloud for long-term archiving because we're under a regulatory requirement to store certain parts of the business data for up to seven years.
At this point, our environment is probably close to 90% virtual. We use physical servers for market data and essentially, there is nothing to back up on those systems because there's no data that's worth saving there. Should one of these servers fail, we just put a new one in place. It would be deployed, including the operating system, and it would start processing market data for us. We consider these as compute nodes and there is no persistent data on them.
We are highly virtualized, so Rubrik is used to back most of the VMs up. We are running VMware ESXi for our VMs, and application-wise, we are a Microsoft shop so we backup SQL Server, Exchange Server, and Microsoft file shares. We also back up a lot of business data, which is contained outside of that server.
How has it helped my organization?
The biggest impact that Rubrik has is that it allows us to have the reliance on the backup, knowing that the data is there and that the ability to restore is there. It provided the safety net we needed to deploy faster. This is because it played a great role in convincing developers and operations to do rapid releases, as opposed to doing it the old way where we didn't have reliable backups. It meant that we had to wrap all the releases in the solid recovery plan in addition to just the rollout. Now, we have the confidence in the backup and can release faster.
Rubrik has saved us time with managing backups in general. For recovery testing, the SLA policies have greatly reduced the time that we have to babysit backups. This is simply because Rubrik put thought into designing their system the right way. Instead of adding a server by creating jobs and creating schedules on top of the jobs, you're just dropping them into an SLA and all of the legwork is done for you, so adding the systems is easier.
The fact that they're SLAs, I don't need to go through the job log and analyze it to figure out why there was a job failure. Similarly, I don't need to look into the impact of the failure. This is because I know that if the machine is protected within SLA guidelines, I will get an alert in case of a problem with a machine. In this case, it means that I need to act and somebody needs to take a look at it. Essentially, it has reduced a lot of repetitive babysitting steps that don't really produce any business value.
We have never had a problem such that Rubrik has saved us downtime. But, it's certainly a great thing to have this additional safety net, which is a reliable backup solution. Everything we have is redundant, so even there is a hardware failure, another piece of hardware kicks in. We won't rely on Rubrik specifically for disaster recovery, but we do rely on it for business continuity. If for whatever reason, both of our data centers lose power or lose internet, or are inaccessible, then Rubrik will help us rebuild the environment. What we don't rely on it for is daily disaster recovery.
As we moved away from our previous solutions, using Rubrik has improved our overall efficiency. These days, we rarely have to do anything with the systems. Most of the time when we have to resolve an issue with the backup it's because the target system has become unavailable or has been taken offline for maintenance. It may also be the case that we have another restore request. These are the only two reasons that a restore might be delayed. It is not the same as we had with NetBackup, where we had to update the agent and software. We don't have to do anything of that nature. Backup is now pretty much gone from our weekly schedule.
What is most valuable?
The most valuable features are reliability and programmability. We have a great success rate for backups with Rubrik and because of the ease of automating tasks, we also run periodical restores to check the quality of the backups.
Rubrik makes it really simple to automate the restore task, which is important because I don't care about the backup. I care about the restores, and Rubrik did a great job of assuring restore reliability.
Our time spent on recovery testing has improved simply because we're able to automate it. It saves us between two and four hours per week, whether it is simply adding a new machine or going through the logs and seeing what failed.
We don't do recovery on a daily or weekly basis. We receive between two and four recovery requests per month. Because it is mostly manual stuff, it is comparable to the old system if we're talking about restoring something within a two-week timeframe when it's still on disk. However, if we're talking about restoring from the cloud versus restore from tape, the timeframes are not even on the same level. This is simply because we use the offsite storage for tapes, so sometimes the restore task from tape will take weeks.
The web interface is easy to navigate and pleasant to look at.
The SLA-based policy has simplified our data protection operations tremendously. It goes back to caring about restores instead of backups, and the fact that it allows me to easily drop systems into the SLAs greatly reduces the amount of time it takes to set up the system for backup.
It allows me to create a protection policy and while it's running, I know that the systems that I've assigned to that policy are being protected accordingly. If that is not happening then I get an alert or a notification telling me that the systems are outside of the protection horizon. It's a great approach.
The archival functionality is impressive. Just by eliminating reliance on the tape technology, it's greatly improved the rate of successful restores that we were able to perform. In two and a half years, I can't remember a case where we couldn't locate data that was backed up using Rubrik.
We have not needed to use the ransomware recovery function but I know that Rubrik backups are essentially immutable. Even if an intrusion does happen, we'll be able to restore the data quickly.
I have used the rapid restore functionality and I noticed that on many occasions, I was able to mount a virtual machine or database on the Rubrik cluster itself. So, I know its high-speed connectivity options are excellent and support VMware well.
With the previous version, we had to do some Python scripting because the API was better and more developed than the PowerShell support. However, with the new version, it seems that PowerShell covers all of the functionality that we need, which is great, especially because we are a Windows shop.
The restore success rate is very good. I don't care so much about improving the time spent on the resource. Rather, it's the success rate. At this point, we have a 100% success rate, which was definitely not the case with any prior system that I've used.
What needs improvement?
I would love to be able to just get from the dashboard to a file that I need, or a system that I need. I believe that right now, there's the ability to search by system name, and then it will take you to the system. It would be great if I can reduce the number of clicks that I need to take in order to do a restore, or maybe to a system and the file, or maybe just directly to the file. It would be like continuous integration with PowerShell.
As we go into the Cloud in addition to Polaris, I would love to see a future where I can back up pieces of the Cloud, perhaps ARM templates or Azure Active Directories from the Cloud to on-prem. I know it sounds counter-intuitive, but just as the Cloud becomes more popular and used on a daily basis, I would love to have just a single pane of glass to provide visibility into the backups.
For how long have I used the solution?
I have been using Rubrik for approximately three and a half years.
What do I think about the stability of the solution?
In addition to just great recovery rates, we haven't had any unforeseen outages with Rubrik itself, due to hardware failure or anything like that. Even the Rubrik software upgrades are non-disruptive in the sense that because they're multiple nodes in the chassis as the upgrade happens, Rubrik never actually goes down and can continue doing the backups on the nodes that are not directly affected by the upgrade.
What do I think about the scalability of the solution?
This is a well-designed product, so adding more space is as easy as adding another chassis. It is great functionality because adding more storage is like adding more bandwidth and more connectivity. That's a great design.
We are a fairly small organization, so probably five to six people have access, and there are probably three or four who use it. We centralize Rubrik to our IT systems and IT help desk, so it's all managed internally. There is enough flexibility to extend it to developers and give certain people rights to certain restores. It's just that the workload is so light that it doesn't make sense for us to constantly keep training users on how to operate it. By the time they need to perform a restore, they'll forget it all and have to come back to the help desk anyway.
If in the next version of Rubrik they announce new ways to back up Azure or Office 365, I would jump on the offer. The main driver for us to purchase additional Rubrik units would be if we were constrained on storage. As of right now, we have sized it correctly so we have plenty of storage to satisfy the SLAs for the data that they need to store in-house.
If our data consumption or data storage requirements increase, and we suddenly need more storage for data protection, we will look into adding units. At this point, we are properly sized for the performance.
How are customer service and technical support?
Our experience with technical support has been great. We had a couple of questions in the beginning, so we interacted about two and a half years ago. You would email them and would get somebody from there, without having to exchange many emails.
They will do the upgrades for you, so lately, probably over the past year, the only interaction we have had with support is when we needed to do an upgrade. It's a great experience where you just open up a support ticket with them, they open up the secure remote channel, and they come in to complete the upgrade.
Which solution did I use previously and why did I switch?
Prior to Rubrik, we used Veritas NetBackup for the backup and CommVault for the tape system. We switched to Rubrik because our success rate was poor. The restore rate was horrendous, especially when we had to go to the tape system. it was hovering around a 75% success rate.
How was the initial setup?
The initial setup is extremely straightforward. We went through the exercises and were provided configuration details that were required from us. I think that they were as simple as supplying IP configuration information. Then, once they assembled all of the racks and wires, the Rubrik technician showed up, configured the system, and it was all done in probably less than 20 hours in total.
Because we're virtual, it meant that our implementation strategy was simple. Essentially, once the Rubrik system had been configured, all we had to do was to point it to VMware vSphere vCenter servers and from there, it automatically picked up all of the virtual machines that we had. Then, it was just a question of assigning them to SLAs and removing them from the old backup system. That final piece is not included in the 20 hours because 20 hours was just to get the Rubrik running. But, it was extremely easy to integrate.
What about the implementation team?
We worked directly with Rubrik to help with the deployment.
For maintenance, you really don't need more than two persons, and that's for redundancy purposes. You can have a single person manage terabytes of backups.
What was our ROI?
By now, we have probably made the money back in reduced support costs. Beyond that, we don't value this type of product by how much money it produces. Simply, the compliance requirements come with steep fines and other repercussions if they are not adhered to. Because this product gives us assurance in our ability to restore data if needed, it satisfies our compliance requirements.
What's my experience with pricing, setup cost, and licensing?
You get what you pay for. Rubrik was probably the most expensive solution but in the long run, it's justified by the value of the data that it protects. We were able to make a case that it's a good investment.
They have a very straightforward pricing model.
Which other solutions did I evaluate?
We evaluated a couple of other solutions, but Rubrik offered the best appliance. We looked at products from Veeam and the present solutions from Veritas and others, but it looked like Rubrik was the most modern solution.
What other advice do I have?
I am familiar with the predictive search but we're not employing it. Usually, when we need to restore, we have to restore the whole machine or we know the location of the file or data that was deleted.
We've considered using the Polaris SaaS-based framework as we're looking into leveraging the cloud a little bit more. Polaris is definitely on our radar, but we're not using it in our day-to-day operations.
I would rate this solution a ten out of ten.
Which deployment model are you using for this solution?
Which version of this solution are you currently using?