We have two use cases. The primary one is a backup target for our backup software and we also use it for archiving off of our primary storage.
We have two use cases. The primary one is a backup target for our backup software and we also use it for archiving off of our primary storage.
The biggest impact, which really doesn't get noticed until you need it, is in our data protection environment. Up until we utilized SwiftStack, we were spinning physical tapes. We now have a disaster recovery facility at a co-lo, and our SwiftStack cluster extends to that disaster recovery location. Because it's a backup target, all of our data gets replicated to our offsite facility automatically.
It has enabled us to store more data with a smaller headcount. We had 50 tapes and managing tapes can be interesting at best. You have to know where they are, you have to keep them in an environment where they don't degrade rapidly. You have to have a person who's dedicated to really managing where the tapes are and what's on the tapes, how they get written, how they get copied, and how they get recovered if you have to recover data, which is huge. That whole process - and we're a fairly small environment - was half to three-quarters of an FTE engineer. That is a person we can now use somewhere else because the data just goes into SwiftStack and it automatically goes to our offsite facility. We don't have any spinning tapes. We don't have to worry about buying tapes. We don't have to worry about tape attrition. All the management around that really went away. The data protection software actually does the management. It's all automated. It has saved us a lot of time.
Another area where it has helped is that we're not expanding our primary storage like we would have had to, since we're leveraging the lower-cost SwiftStack storage system. Enterprise storage requires another, say, half of an engineer at some level, to manage it, keep the storage correct, move things around. I manage our enterprise storage and it can take a considerable amount of time out of my day. I log into the SwiftStack just to look at how much I'm using and see if I have reached capacity yet. I might look and say, "Oh, look, I have a couple of drives that failed. I'll get to those in a couple weeks." Whereas, with my enterprise storage it's, "Oh no, I have a drive failed. I have to fix it today." It reduces the amount of overhead.
The other way it's added value to our workflows is the data archiving, where we take data off our primary storage and put it into our object store, SwiftStack. It is much easier to manage from a growth and maintenance standpoint, which again, saves time and therefore money.
In terms of the hardware flexibility, with SwiftStack not being a hardware company, I literally buy any hardware that's the least expensive, from any vendor. When I have hard drive failures, I usually wait, and then, once a month, I'll go to Amazon or NewEgg and look at the drive with the lowest storage cost per terabyte. Currently, 10 TB drives seem to be the least expensive per terabyte. I'll just buy that drive and swap them all out once a month. I let hardware fail and then I'll replace it with a larger-capacity drive. Over time, it increases my capacity as well over my overhead. So from a flexibility standpoint, I think it's fantastic. I can go to anybody, anywhere - any vendor - and get my hardware.
There was a large cost saving over using our primary storage, which is our traditional enterprise NAS. We used a single supplier for our hardware but we always look around when we do new. The ability to grow, both vertically and horizontally, because of the SwiftStack deployment model, is so easy that we really don't even think about it. We, literally say, "Hey, we need to get more storage. Do we need capacity or do we need performance?" Performance really hasn't been an issue, so in some cases, we may just buy SaaS JBOD shelves and hang them off our existing servers, which we've done once already. Or we might buy lower-performing servers with a lot of storage slots and add nodes across our cluster. That saved us quite a bit of money.
I can't say exactly how much it has decreased our cost of storing and utilizing data, and the big reason is that data growth doesn't stop. But I know it's less expensive. With our enterprise storage, to get the same level of protection that I get with SwiftStack out of the box, I might have to pay for my initial storage, then whatever protection level they have - whatever RAID level - and then I'll have to buy two or three software packages on top of that to do the mirroring and data protection at that level. And then, I might actually have to use more backup software licenses to back up that data as well. By the time that all adds up, we would probably be at the $4,000 per terabyte level. I know what my backup software costs, and it's outrageous, per terabyte. Whereas, with SwiftStack, I'm $500 or less per terabyte. Done. All in. Those numbers are general numbers. They're not accurate by any stretch of the imagination. But they are pretty close to the level of difference between the two.
The biggest feature, the biggest reason we went with SwiftStack, rather than deploying our own model with OpenStack Swift, was their deployment model. That was really the primary point in our purchase decision, back when we initially deployed. It took my installation time from days to hours, for deployment in our environment, versus deploying OpenStack Swift ourselves, manually. Since then, there has been a lot of value-add that we've gotten out of it with the SwiftStack Gateway and ProxyFS, and the Metadata Search that they've added over the years.
It performs much better than I expected. We have a fairly large capacity network that supports our SwiftStack hardware. So performance, for us, has really never been a large issue. As a backup target, which is probably the one place that it would matter, our max ingest rate is about 2 TB per hour, which is more than adequate for our needs.
When I started, SwiftStack was just a deployment model for OpenStack Swift. It did what it needed to do. There may have been some room for refinement procedurally in how to do certain management tasks.
Over the years with ProxyFS, I don't know where the improvement would lie - whether it lies with SwiftStack or other vendors - but, in general, we need a model where more third-party companies support REST APIs, open REST APIs into object stores such as OpenStack Swift. I really don't see any great strides being taken to do so. There are a few, but a lot of them do the big players, like Amazon S3, and not the open-source players.
The other thing that I've been looking for, for years as an end user and customer, for any object store, including SwiftStack, is some type of automated method for data archiving. Something where you would have a metadata tagging policy engine and a data mover all built into a single system that would automatically be able to take your data off your primary and put it into an object store in a non-proprietary way - which is key. A lot of them say they do it, but they do it with their own proprietary wrappers and then, if you don't have their system sitting in front of it, you can't access your data anymore. I think SwiftStack, with its value-add, that's where they're going. If they do that, they have a storage system that would kill all the others.
When we first deployed, we had a few issues. SwiftStack is based on OpenStack Swift, and this is where I love the SwiftStack support. There were, initially, some bugs in the code, and I would work with their support staff, literally all night long, and within two days we'd have a code fix. So we did have some stability issues early on, two-plus years ago. Over the past couple years, there have not been a lot of stability issues at all, and that includes my doing really silly things with the deployment; either a mistake in configuration or doing something that, on paper, seemed right but wasn't. Even that was so easy to recover from.
I've never lost any data at all through any of it. We've had nodes go offline. We've had some pretty crazy stuff happen, and I'm extremely impressed. We've never lost any of it.
As far as I'm concerned, the scalability is endless. Scalability is not an issue. I don't even think about it. If I need capacity, I purchase the stack license, purchase my hardware, and off we go.
Their support staff is second to none. They're the best support staff I've ever worked with, with any vendor of any caliber, in the past 20 years. That includes day-to-day support and help with setup issues I've had. I hold them up as the model for every other vendor, and that includes big, corporate enterprise vendors which have support that is absolutely horrible these days. And I've been doing this for 30 years. SwiftStack has just been a pleasure.
At a previous employer I had experience with Amplidata, Core System, and DDN WOS. The way we got to SwiftStack is that I came from a medical facility where they did research into genomics, Big Data. We were leveraging object stores there for data archiving, and to have less expensive, long-term storage for large amounts of data.
When I came onboard here in 2014, I looked at their needs and said, "Hey, they have the same exact problem, just on a smaller scale. And that's how it started. I proposed it, brought it in as a backup target first, and that worked so well that we started extending it into data archiving.
For us, the initial setup would be complex. We have two regions, and we have two zones in one and one zone in another. The most complicated part of our configuration is the network connectivity to our disaster recovery site and having enough bandwidth to that site. We've been pretty lucky that we have pretty good bandwidth to our disaster recovery site.
In general, I don't think it's more complex than any other solution, and probably less so than a lot of these, "converged, hyperconverged, super-duper-converged," computational clusters that are being built now.
Once we had the hardware and everything was in place, the initial deployment took me a day, and that was with nine nodes and three zones. We started out with a petabyte of storage, initially, raw, and, the deployment was literally a day. It took a very short period of time.
Long-term, we had always intended to have offsite. When we first deployed it, we decided to keep everything at our one site and get some learning done with it, all in one data center. When we first deployed, what helped reduce the time it took was that everything was in the data center. We had all the networking right there within two or three racks of each other. It was quite easy to deploy the nodes. That was always our strategy.
When we first deployed it, it was intended to be a backup target. It took us quite a while to work with our backup vendor to get them to support it properly so we could use it. It was not the fault of SwiftStack. In fact, they helped out considerably in working with the data protection vendor to help them support the object.
Once that was done, it was pretty much just leveraging the system. We've always had that model of: first do it onsite, then do it offsite. It was always going to be primarily for backup and then, eventually, move into archives. We're currently doing more and more archive. Right now, half of the capacity is as a backup target and the other half of the capacity is for data archives, digital assets.
We needed one person for the deployment: me. We're a really small shop, and that may be why it didn't take me long to deploy, because I'm also the network guy, the primary storage guy, the enterprise compute guy. I didn't have anyone else to get things done. I did everything. Of course, I had people help me rack and stack, and help me cable, etc. But the actual deployment is literally so easy that it's really a couple of mouse clicks. And it's just me for maintenance.
We didn't involve any third-parties. SwiftStack itself helped us out immensely.
The ROI is both monetary and in time. The only way it could get any cheaper would be if we stopped generating data.
That's really the bottom line to all this. Until people stop generating data that we have to store somewhere, we're going to be paying money. It's just how best to utilize the money we're spending on it. My opinion is that using SwiftStack is a good way.
All in, with hardware and everything else - and I hate to say a dollar amount because it's been awhile since I computed it - I know I'm under the $300 to $500 per terabyte mark. I call that my "all in" price, which has replications built in and protections built in. It's not like you're spending that dollar amount on enterprise storage, and then you have to buy some software for replication, and buy other software to do mirroring, and other software to actually back up that data in case your storage itself dies.
The costs of enterprise storage add up when you look at everything you need to run it. With SwiftStack, I can use a whole data center in my current deployment model. The price per terabyte that we pay for the SwiftStack software doesn't seem all that much to me right now. It may in the future, but right now, it doesn't.
You don't need the fanciest hardware out there but you need enterprise-grade equipment. The real important thing is to map out the connectivity of all your nodes and know how the data gets traversed so you don't create bottlenecks in your network, in your data transfer; communication between the nodes, front-end communication. Those are really the two big things. Everything else is pretty straightforward once you have your configuration down and your network path down. Everything else is really relatively easy.
I buy SwiftStack storage, generally. Actually, that's all I've bought in the past two years. Primary storage stays pretty constant. I've actually reduced the amount of primary storage by 20 terabytes over the past two months. That's really how we're trying to do our storage. We'll buy SwiftStack and reduce the amount of primary storage, the amount we keep up at our primary level, and just buy the archive storage. Honestly, 70 to 80 percent of the storage we have is unstructured data that we don't need to access more than once or twice a year.
We do not currently use the 1space feature. That is on my roadmap to get deployed. We really don't have a big "cloud strategy". We don't use off-prem storage. All of our storage is on-prem at the moment.
As for working with petabytes of data, most of the bottlenecks for ingest aren't at the SwiftStack layer, they're, generally at the application layer, or depend on the method you're using to ingest it and how threaded it is. Going back to the example of our backup software, where I see the metrics all the time, at a rate of 2 TB an hour, I'm really CPU-bound by the systems that are pushing the data into SwiftStack. If I had more nodes to send data into SwiftStack... It's not a SwiftStack bottleneck. It's on the end-user space or in computation on the client side more than the SwiftStack side.
I don't think SwiftStack has enabled us to store more data than we did before. The reason is that part of the license that you get is the deduplication part. In our backup target, we deduplicate the data or it goes into the object store. In that regard, it's probably a wash. For the archive data, there's no deduplication, but the idea is that the cost is so low that it's also a wash, or we come out a little better. Would it allow you to store more? I'd say it's break-even. At the end of the day, it's cheaper so I can add storage. The cost of putting storage there, even with the larger amount going into it, is still cheaper than our enterprise storage. You're really bound by either drive capacity or data center space.
Initially, in my test environment, I used some older servers that we had kicking around which we were in the process of retiring. Those particular servers - it sounds silly - really weren't conducive to SwiftStack because they had a lot of features that actually prohibited easy maintenance. You really do want no RAID controllers or anything else. We quickly learned that the least expensive servers were absolutely the best servers to have for SwiftStack. During the first month or so, we used old hardware that we had kicking around. Then, when we decided that this is definitely something we want to pursue, we bought enough for what would be all on-premises, all on a single site, and then we deployed production in that. Later, we got out disaster recovery facility and we literally moved one set of nodes to our DR site. That all happened transparently to end users. It just worked. They didn't notice anything going on during that time.
We have about ten physical users of the solution. The video editing guys are big users because we wanted to get their data archived first since they're the biggest utilizers of our enterprise storage. Our marketing department has a huge amount of digital assets: catalog images, web images. Those are probably the two biggest. Then, our IT, because of the backup target, as we do all of the protection of our enterprise system as well, with Oracle Databases.