What is our primary use case?
Primarily this is used as the backend iscsi SAN for our oracle 12c RAC implementation.... 2 x 2-node clusters, plus 3 add'l servers (dev/qa/stg). We also use now for some limited-use VMs (vmware), and have implemented the VVOL configuration that SF makes available. We debated using this for non-prod data for Oracle or not, but two things swayed our opinion. 1) We would not incur a huge disk-space penalty for having dev/qa/stg there as the de-dupe functions would come into play, and 2) we can guarantee IOPS so we know that regardless of what we do in dev/qa, it won't incur a perfornance penalty for production volumes.
How has it helped my organization?
The compression and de-dupe have been great in terms of space-savings, especially for our prod/stg/qa/dev DB instances (where you gain add'l savings for the de-duped data); the QOS for IOPS helps us to ensure that no non-prod action can be deleterious to our production-stack data
What is most valuable?
- Expandability (incrementally and non-disruptive
- Compression/Deduplication/thin provisioning
- Recovery from failure/data-protection
- Guaranteed IOPS per volume
- Simple browser web-admin (with extensive out API interface)
What needs improvement?
The level of monitoring could be better. They give you access to stats and it is very informative. But you really need to do your own internal availability monitoring. Perhaps they just assume you are. And part of the thing, perhaps an adjustment on my part is needed, is that because something like a drive failure is handled internally and data-blocks are re-duplicated automatically, a failure somehow becomes less urgent. That is not second nature to me.
Having said that, 1) support reaches out if there is an issue, and 2) the on-line reporting is pretty good and only getting better.
For how long have I used the solution?
One to three years.
What do I think about the stability of the solution?
There were no issues with stability. We've had 1 failed drive so far, and gone through 2 firmware upgrades - including reboots of invidual nodes, one at a time - and everything continues to "just work".
What do I think about the scalability of the solution?
There were no issues with scalability. Far from it - see previous comments
How are customer service and technical support?
Customer service was good. I haven't needed much so far. We prefer to be our own source of knowledge and reach out to clarify or confirm something.
Technical support is good and helpful. While you can schedule the node S/W upgrades and have them take care of, I had them walk me through it, as we were in pre-production at the time. Knowing/understanding more about the process gave me a better feeling.
I don't like black boxes, so anything I can understand or wrap my head around things provides comfort. The nodes are ubuntu and they leverage ubuntu/debian update mechanisms. These methods are well-known and understood, so no re-inventing the wheel was necessary here.
Which solution did I use previously and why did I switch?
We have some older EMC boxes that were not sufficient to the task. We wanted an AF (all-flash array).
How was the initial setup?
The setup was quite simple. Even though we had help, it would not have been required. To date, we've added 2 add'l nodes with no outside assistance.
What about the implementation team?
We implemented in-house, although SF sent a technical staff members out to us. He allowed us to pick their brain and ask questions, which was very helpful.
What's my experience with pricing, setup cost, and licensing?
I believe the initial buy-in/purchase is more expensive, because you are starting out with 4 (minimum) nodes. It then becomes cheaper and easier to expand and grow.
For example, compared to the more traditional dual-controllers+shelf, expanding to a new shelf was a pretty big investment and you needed to fully populate it with drives).
That uses the same controllers, so you have added capacity but not performance. Whereas, adding another node is a relatively simple operation. You don't even have to add all the drives right away. Licensing is via your support contract.
Which other solutions did I evaluate?
We did an extensive evaluation of several products and vendors, looking at SF, Kaminario, Nimble, Pure Storage, EMC, and HPE.
Price was a factor, but it was not the only factor. We are not a huge shop, but are growing, so we wanted something that had a solid architecture for now and for later.
We wanted it to be as bulletproof as possible, and yet be able to change/grow with us. The more standard, dual-controller-with-1-shelf can survive with a controller failure, or 1+ drive failures, but what about a shelf failure? While this is unlikely, it is still a possibility.
With SF, a few minutes after a drive failure, the data (blocks) that were located on that drive are re-duplicated elsewhere. In a very short time (a few minutes), you are fully-protected again. And as long as you have sufficient spare capacity - you can lose an entire node with no data-loss and reportedly only a small performance hit (even software upgrades are non-disruptive, as they are done 1 node at a time).
That entire node's data is re-duplicated elsewhere on the remaining nodes. If you don't have a node's worth of spare capacity, that becomes more problematic, of course.
What this also means is, as you add nodes, for increases in both capacity and performance, a.k.a. the scale-out model, you also get faster recovery times in case an entire node fails.
Adding nodes is a simple as:
- Adding a node to the cluster
- Adding the drives.
Data is re-balanced across the new nodes automatically. Removing/Decommissioning a node is just as easy:
- Remove the drives from the cluster
- Allow data to be re-located
- Remove the node from the cluster
There is another unique option. Let's say I grow to 10 nodes, but the LOB application changes, and the role is no longer the same. I can break that into 2 x 5-node arrays and redeploy in different roles.
update: since doing the initial review, we have added two additional nodes. Very easy to do, the data re-balancing (distribution) is done automatically.
What other advice do I have?
I'm not sure why SF isn't more popular in the SMB space. To my mind, it offers a unique combination that isn't easily matched in the marketplace. Kaminario seems to be the closest. I haven't had it long enough to truly "know" the product, but will happily revisit this in 6-12 months.
Since the intial rollout, we have implemented VVOLs on SF with our VMware 6 setup. Once setup - the initial configuration and communication, plus the SPBM policies - it is quite easy to use, and allows the vmware admin to do it all without having to touch the SF webadmin URL - even setting IOPS per volumes is done there. Very nice.
Lastly... scaling up, either for perf. or capacity (more likely), is so much of a non-issue that it is hard to over-state:
- predictable cost: you are adding a node, you know how much they cost. No "threshold" where you have to add add'l controllers, or a new shelf, nothing like that
- no (minimal) impact to add to a running system. They _say_ that when data is re-balanced (across the new node(s)), you have a percentage perf. hit, but we have not noticed this (and we've added 2 add'l nodes so far).
- in fact, adding OR removing nodes requires no downtime, literally a 'non-event'
Disclosure: I am a real user, and this review is based on my own experience and opinions.
May 07 2018