What is our primary use case?
We are using it for storage of video files, with casual access to them. We needed as much storage as we could get for the best price. If you are looking for a hybrid type of situation, when you want low latency for transactional things, and higher-latency storage for archival things, you can get the hybrid nodes.
Each of our two clusters has the same disk sizes, etc. We did that for interchangeability, in case we wanted to move shelves between the clusters. They act independently, but they replicate between the two. We love the system. That's why we continue to upgrade and buy it.
What is most valuable?
The low latency, the high-capacity connections that we have with the nodes, and the ability to add as needed to a particular system, are all important features for us.
It also handles data distribution among the nodes internally. You really don't have to do anything, so management is easy. If you're someone who really wants to get granular and know where every bit or byte is going, maybe it's not for you because I don't know if you can get that granular.
We have over a petabyte of storage and we've sliced it up. You can't really call them "shares" because it's not really like an NFS mount or CIFS share. But we've sliced it up and the policies and auditing on a particular system are, in fact, too much data. Anytime a file change or any system change happens on it, it records it and we ingest that into a SIEM. We can crunch it so we know who is changing what file at what time. That gives us auditing capabilities.
The policy-based management that we have, for who accesses what shares, is relatively simple to set up and manage. It's almost like managing an Active Directory file share.
There are also the policies that you set up on replication and purging files, and policies for something called WORM. That's a "write once, read many," where you can't overwrite certain files or certain data. It puts them in a "protected mode" where it becomes very difficult for someone to accidentally delete. We use that for certain files or certain directories, because we're dealing with video and some video has to be protected for chain-of-custody purposes. The WORM feature works great.
The OneFS file system is very simple and has an astronomical number of features that allow us to get very granular with permissions, policies, and archiving of data. It handles everything for you. It's one of the easiest storage solutions that we've ever implemented in the 12 years I've been working in this organization.
I also love the snapshot functionality. It's pretty much what everyone does in backup. It's a backup of your system, but it lets you set the frequency of the snapshots. That's very important to us because we take so many snapshots. That means we can recover up to six months back, if somebody makes a file change or deletes a file. It's like a versioning type of function. It probably isn't really special. A lot of backup software has it. But the snapshot functionality is what we utilize the most within the OneFS file system. In theory, you don't really have to back up your systems if you're taking snapshots.
What needs improvement?
The only problem with the WORM (write once, read many) feature is it does take up more space than if you just wrote a file, because it writes stuff twice. But it works for us for chain-of-custody scenarios, and it's built into the file system itself.
Also, on the PowerScale system, because of the magic that it does "under the hood," it is very difficult to find out within the system where all your storage is going. That's a little bit of a ding that we have on it. It does so much magic in order to protect itself from drive failures or multiple drive failures, that it automatically handles the provisioning and storage of your data. But by doing that, finding out why a file of a certain size, or a directory of a certain size, is using more storage than is being reported in InsightIQ, is very difficult to discern. It's the secret sauce of protecting your data and that makes it a little disconcerting for someone who is used to seeing if a directory is using 5 MB of space. So if you have a directory using a terabyte of space, it might be using a little bit more because of the way that the system handles data protection. That is something you have to get used to.
Also, a lot of people are not used to the tagging or the description in the InsightIQ application. We're used to using the normal nomenclature of terabyte, petabyte, etc. They utilize TB byte and PB byte. So you have to understand the difference when InsightIQ is telling you how much storage you have. It's different than what we're used to. It uses base-2 and the world is used to base-10. Discerning how much storage you actually have, from the information in InsightIQ, takes a little bit of math, but it's not very difficult. I wish they had an interface in there where you could click and it would report in the way the industry is used to, which is in terabytes and petabytes. It's nothing major, just something you have to get used to when you're looking at it.
For how long have I used the solution?
We have two clusters. We purchased our first cluster about seven or eight years ago. We've refreshed that particular cluster, where we traded in the old one and brought a whole new cluster. In the midst of that purchase, we also bought a second cluster where we replicate some files between the two. We just refreshed and upgraded that second cluster, which was probably about five or six years old, and bought a whole new set of A200 nodes for it, so the shelf sizes are the same.
What do I think about the stability of the solution?
We've had some bumps and bruises when buying new nodes and adding them to the cluster, but I don't think it was the technology that we really had the problems with. It was, unfortunately, Dell EMC support, where we got a couple of Dell EMC engineers who weren't as familiar with the system as we'd like. Once we kicked it up the chain, and we had an engineer that was more versed, they fixed the problem relatively fast.
When we had the first iteration of PowerScale seven years ago, we added nodes to that. This was how that process went: The node came in, it was already populated with drives, you slapped it in, put it into the rack, cabled it up to the networking, and put the networking on the same VLAN, the network backend configuration. Then, you went into the configuration manager, the OneFS file system and you told it about the node. You said, "I have a node that I want to join to the cluster." It brought the cluster in and, for lack of a better term, formatted it, added it to the array, and it was there. The amount of time it took to cable up and join that node was about two hours. Once it's there, the storage just expands.
In theory, and what we expected with the newer systems when adding nodes—and this is the way it does work, once they figured out the problem that they were having—was that it would be the same scenario. You rack the system. If you get the networking done right, which is really easy—you just drop it on—it handles a lot of the internal networking within the cluster itself, but you need to put it on the same external VLAN. If you do that right, the OneFS file system just finds it. You add it, and it just assimilates it into the cluster. Once the networking is done, it should take under an hour for it to get assimilated into the node and for the storage to become available.
Most of the problems we had were when we were adding on. We really haven't had any problems after it was up and running. When it's up and running, it's rock-solid. We never really get failures other than drives failing, because all SATA drives fail. But you just pull out a drive and you slap another one in.
What do I think about the scalability of the solution?
We were using it for video storage and we were pretty impressed with its scale-up and scale-out abilities. We are always looking at the ability of a platform for scaling up and scaling out, especially because it's file storage. This was the best thing on the block that was out there.
How are customer service and technical support?
In recent months, their backend technical support has waned a little bit. They need to address the first-line technical support. I used to have a lot of confidence in Dell EMC technical support, but since COVID—and maybe it's the COVID situation—the technical support has fallen short a little bit. We've run into some problems with them.
They stand behind their product. The support that I get from my support group and my enterprise management team is phenomenal. When there's a problem, they address it. It may take them a little bit of time, but they own up to it.
But calling in and getting that first-line technical support needs to be addressed. It's been a little bit of a "hunt and peck" when you have issues, as opposed to just coming up with the actual solution to a problem. That's only been the case in about the last nine months or a year. I continue with Dell EMC because when there's an issue, they back it up and they make it right.
How was the initial setup?
It's one of the easiest things to configure. It's pretty much set-it-and-forget-it.
Initially, because in the first system that we had seven years ago the drive space was so small—I think they were 4 TB drives—there were a lot of shelves. We had over a petabyte of storage, so it was a lot of shelves. The installation, physically, was what took a really long time.
Now, the drive size is much bigger and the density per shelf is much greater. The actual shelf count is a lot smaller, so the physical racking is a lot easier. When we switched over to the new A200 nodes, we went from four nodes to one, four shelves to one shelf, when we did the conversion.
With the initial install, it has to format all the drives and that can take some time. It was a long time ago so I'm not sure I remember correctly, but I believe it took us a day or two to format all the drives. But we had 12 shelves. After that we were fine.
But when you add on, it just brings them up and formats them into the array, relatively quickly. But the initial one, depending on how many singles you have, can take hours, and up into a day, to format everything.
The second installation that we did was a lot quicker. We stood it up, had those initial problems adding the nodes, but then we had to move it because we had to move data centers. When we moved it, it took less than half a day. We actually had to shut it down to move it out of a data center into another data center. We carried it over to the new data center, rack mounted it, fired the thing up, and it just took off like it hadn't even been moved. It handled a good "power-down" situation with no issues.
What about the implementation team?
It was done with two guys from Dell EMC and one of my system engineers. The network guy did some backend configurations. The two guys from Dell EMC came because they were physically mounting all that stuff. When we added the second one they sent two guys, but one guy pretty much just sat around and did nothing while the other did the hands-on-the-keyboard stuff. I had a system guy down there to help with how we wanted it configured. But it's relatively simple.
Overall, the first deployment was phenomenal. Everything worked out great. The training, what they conveyed to us and walked us through, that was phenomenal. The second deployment, on the second array—same thing, when we were running with the older nodes.
Then when we did the transition where we swapped out to the A200 nodes. Once again, phenomenal, everything worked out great. When we got the A200 nodes for the second cluster and upgraded them, the installation of that went fine.
When we started adding shelves, that's when the technical support fell on its face because the individuals that were working with it were not well-versed enough. I guess they assumed—and it's how it should be—that when you add a node, it's just rack it and stack it and then turn it on. But it didn't go that easily. There was some low-level engineering trick that you needed to know about, and these particular individuals didn't know about it. They do now, because we had to escalate it. The escalation was a little frustrating because it took about two days to get to the right person. But that right person knew the answer in five minutes.
What was our ROI?
We did an analysis of using cloud storage and on-prem storage. We did a comparison of the total cost of ownership between the two. Every time we have done it, the cost of onsite storage using the PowerScale system is fractions of a penny, per gigabyte, compared to cloud storage. There are no access fees or access charges like you get with cloud storage. If you want to utilize cloud storage, there are retrieval costs sometimes. I know there are different levels of cloud storage where you can archive and then pull up, but it takes about a day to get them to pull that stuff out of archive, and then you can access it. But there's also those access charges. You don't get that with the PowerScale system.
What's my experience with pricing, setup cost, and licensing?
We're at the A200 version, which is more for online archiving. It's storage-based, but they're called archive nodes. They're all SATA spinning disks. If you need a lot of storage at a cheap, economical price, and really high-speed, if you're not doing transactional stuff, they have these archive nodes. The PowerScale A200 is more like an online archival system where the nodes are there but you're actively addressing them. It stores them on spinning disk so you get tons of storage for a good price.
What other advice do I have?
Networking can get a little confusing. The big thing is to make sure you carve out your VLANs to this particular system. Put a lot of thought into the network aspect of it. Don't just slap it into your server network. Carve out an isolated network for your storage subsystems and make sure they have high-speed paths back to wherever you're going to be accessing it from. Don't cheap out on that because this system scales out and scales up. If you start cheaping out on the network part of it, you're not going to be happy with your access to it. The biggest thing is to configure the networking right and give it the unabridged paths that it needs to realize the low-latency, scale-out aspect of the system itself. You can jam yourself up if you neglect the networking aspect of it.
The A2000 system they have now, which we didn't even look into, is more of a non-active archival type system. They also have these hybrid systems where you would have staging areas where you could store on spinning disks and tier. Your storage becomes a tiered storage infrastructure where you have spinning and flash storage. You can put your high access, low latency stuff on your flash storage, and your archival, higher latency stuff, on the spinning disks of the hybrid nodes. We were looking at that, but we're not using this particular system as a low latency, production-type system.
They also have the all-flash arrays, which is where you're getting massive amounts of throughput but it's just expensive, obviously, because it's flash. It's a lot more money. We weren't looking into that because we did not need speed. We were just looking for storage options. We have a different Dell EMC product that we use for our day-to-day, low latency, server-based storage. That's where our block storage is. Our file storage is what we use the PowerScale for. We didn't want to go to the all-flash array nodes. They're not cheap and we already had a solution in place for that.
Overall, the hardware itself, and the OneFs file system, are the best selling points, combined with the delivery and the installation. That's why I continue to buy Dell EMC.
Which version of this solution are you currently using?