Red Hat Ceph Storage Review

Provides block storage and object storage from the same storage cluster.


How has it helped my organization?

Ceph has helped our organization to provide a Software Defined Storage solution in our private cloud.

What is most valuable?

The ability to provide block storage and object storage from the same storage cluster is very valuable for us.

We are using Ceph as back-end storage for our OpenStack cloud. Ceph provides:

  • Block storage for storing the OpenStack images or VM templates
  • Block storage for OpenStack Cinder volume service
  • Block storage for OpenStack Nova VM compute service boot volumes
  • Object storage for the OpenStack Swift service

Without Ceph, we would have ended up with at least two storage systems: One for block storage and another for providing Swift Objectstore.

The other big advantage is that Ceph is free software. Compared to traditional SAN based storage, it is very economical.

What needs improvement?

Ceph does not deal very well with, or takes a long time to recover from, certain kinds of network failures and individual storage node failures.

I believe the community that supports Ceph is working on this. They will be providing solutions to improve these issues in the newer versions, like Jewel and in the future with technologies like BLU Store and RDMA.

What do I think about the stability of the solution?

Stability in a normal operating environment is satisfactory. Improvements would come from providing better data re-balancing algorithms when the storage cluster is expanded. Currently, cluster expansion is a user impacting process.

What do I think about the scalability of the solution?

We have not noticed any issues with scalability. In fact, when more nodes/disks were added to the cluster, it improved performance due to its nature of being a native object store.

How is customer service and technical support?

We are using the open source version. However, there seem to be many vendors, in addition to RedHat, who sell or provide support for Ceph.

Which solutions did we use previously?

We used traditional fiber based SAN storages before we started using Ceph. The main reasons for switching to Ceph were:

  • Ability to provide block as well as object storage
  • Open source system
  • Scalability: Performance actually improves as we scale the cluster bigger

How was the initial setup?

The initial setup required a lot of research and learning to understand Ceph storage's underlying technology. Once we had the right understanding and configurations, it was pretty straightforward.

However, this is not a traditional storage solution. It may not be straightforward for storage administrators, but easier for cloud administrators with good Unix/Linux knowledge.

The key things to consider while deploying Ceph, especially for block storage (also known as RBD) are:

  • Use a higher number of disks to get more IOPS. (Ceph is a copy-on-write storage, so usage is less of a worry than providing the right number of IOPS.)
  • Use SSD journal disks to improve write performance. (In fact, with the price of SSD drives coming down, use all SSD or NVME+SSD configurations - more IOPS makes a better solution.)
  • Use SSD for Ceph MONITOR nodes
  • Use networking speeds of at least 20 Gbits/sec or more since this is a network based storage on all clients as well as Ceph nodes. As you move to full SSD or NVME disks, the networking needs to match up.
  • Select the right CRUSH map and Placement Group numbers based on your storage pool size and node distribution in the data center.

What's my experience with pricing, setup cost, and licensing?

Pricing/licensing depends on what kind of internal knowledge or expertise exists in your organization about Ceph.

If you don't have the expertise, choose the right partner or vendor based on proven expertise by the vendor in large production environments.

Which other solutions did I evaluate?

We did not evaluate other storage solutions. We spent the time understanding Ceph better to provide a stable solution.

What other advice do I have?

Ceph is open source and there are large organizations running huge Ceph clusters which have published blogs on how they deployed Ceph.

Do your research based on the lessons learned from these users of Ceph to decide on which configuration and architecture to use for Ceph.

As organizations move to Linux container based technologies and container orchestration frameworks (especially Kubernetes), Ceph is still relevant as it provides integration into these future technologies to provide block storage for them as well.

It's ultimately all about IOPS. When a failure occurs CEPH tries to 'rebalance' data on the surviving nodes which can consume a lot of IOPS affecting client IO. If there's not enough IOPS or fast data rebalancing, it can take a lot of time to rebalance data. Some of this can be improved with faster networks and faster drives like SSD or flash drives (which people can implement right now in older versions of CEPH), some of the improvements will come from how CEPH writes data using BlueStore and replicate or rebalance data between OSD nodes using RDMA (which may become stable for users in newer versions).

Disclosure: I am a real user, and this review is based on my own experience and opinions.
1 Comment
Md. Helal UddinUser

Thanks

11 February 18
Guest
Sign Up with Email