Microsoft Storage Spaces Direct Review
There were many situations I put myself in while testing, and the data was never at risk


What is our primary use case?

Full production. This is now our primary storage and hypervisor (hyper-converged) solution for our primary cluster.

How has it helped my organization?

The performance is immediately noticeable by end users of all our on-premise applications. The cost (we were a Hyper-V customer anyway) was significantly less than anything comparable.

What is most valuable?

Resiliency. There were many situations I put myself in while testing, and the data was never at risk.

What needs improvement?

RDMA ease of deployment. The performance benefits only came with all the new technology, and not only was RDMA a big requirement, but it was also the most challenging to be fully confident in 100%. We used RoCEv2 and switched to iWaRP a year later.

To expand on our challenges, we have the hosts connected via multiple 40GB connections to Cisco 9396 switches with vPC. We had a lot of experience with Fiber Channel in the past, but using ethernet for storage was a change that we didn't have a lot of practical experience with. MS strongly recommends using RDMA and we decided to use RoCEv2. After it was all setup we could see the performance counters confirmed that RDMA was being used, but that doesn't mean that DCB is working 100% correctly. There isn't a lot of great articles published on PFC and DCB configuration end-to-end because it depends on your NICs, Host OS, Switches, etc. Piecing learnings from Mellonix, MS and Cisco documents we believed we had it configured correctly, but we never had 100% confidence that we had and it is very difficult to find a partner willing to put a stamp of certification confirming they believed it was 100% configured correctly (Cisco vPC, DCB/DCBx, LLDP, PFC, SMB multichannel, RDMA, etc. all in the mix). When we experienced some unexplained issues that pointed to intermittent network issues which some errors suggested could be related to RDMA, it was difficult to troubleshoot. When we switched to LACP with vPC (which doesn't work with RDMA/RoCE and so we disabled it) the issues didn't reoccur, but the performance became much less consistent. When we switched to iWarp, the performance was reliably good again and the issues didn't reoccur. It's difficult to be sure where the issue was, my gut says it was PFC configuration on the Cisco switches and with iWarp DCB doesn't need to be 100% because it uses TCP rather then PFC to tolerate certain network conditions. I think we would have seen similar issues with vSAN, but I can't be certain...it may be more tolerant of the edge cases.

For how long have I used the solution?

One to three years.

What do I think about the stability of the solution?

We had challenges with stability in the first few months while testing S2D, just as we were migrating production to it. It was frustrating at times, there were learning curves, and when you run your storage over the network, you need to be fully confident in every aspect of your cluster's network configuration. I believe we would have had similar challenges with vSAN or any other storage solution using our network as the host/storage interconnect. After working with it and learning it more, I have much more confidence in the stability of the product, and with Storage Spaces Ready Nodes from vendors (which didn't exist when we bought and built), it is much easier to become confident quickly.

What do I think about the scalability of the solution?

The only challenge with scalability is that after you add hosts to the cluster, you'll want to create a new volume and live-migrate your workloads to it, then delete your old volume. Hopefully, this can be automated with the "optimize-volume" in the future.

How is customer service and technical support?

I did have to call support, and Microsoft's frontline support is not adequate if you rely on this for your business. Microsoft Premier support is worth it if you are a medium to large business, or running mission-critical services from this infrastructure.

Which solutions did we use previously?

We used Dell Compellent as our storage previously.

How was the initial setup?

The documentation when we built (as soon as Server 2016 was RTM) was not as good as it is now, but it was still relatively straightforward with the exception of Cisco/Mellanox RDMA interoperability.

What about the implementation team?

In-house (me).

What was our ROI?

Two years. We were paying huge support fees for storage with our SAN. We manage it all ourselves now.

What's my experience with pricing, setup cost, and licensing?

If you still use VMware as your hypervisor, you should consider Hyper-V. Since 2012 R2, it is as good as VMware, and with S2D it is much more cost effective.

Which other solutions did I evaluate?

We didn't test any other options, but we did research and evaluate other options (Nutanix, Nimble, vSAN, etc.).

What other advice do I have?

Stop using mechanical spinning disks; it costs more in the long run for all the performance challenges. Consider iWarp if you don't already have advanced experience with successful RoCE (DCB, PFC) deployments.

Disclosure: I am a real user, and this review is based on my own experience and opinions.

Add a Comment

Guest
Why do you like it?

Sign Up with Email