Evaluating the performance and scalability of the Ceph distributed storage system

As the data needs in every field continue to grow, storage systems have to grow and therefore need to adapt to the increasing demands of performance, reliability and fault tolerance. This also increases their complexity and costs. Improving the performance and scalability of storage systems while maintaining low costs is thus crucial. The evaluated open source storage system Ceph promises to reliably store data distributed across many nodes. Ceph is targeted at commodity hardware. This study investigates how Ceph performs in different setups and compares this with the theoretical maximum performance of the hardware. We used a bottom-up approach to benchmark Ceph at different architectural levels. We varied the amount of storage nodes and clients to test the scalability of the system. Our experiments revealed that Ceph delivers the promised scalability, and uncovered several points with improvement potential. We observed a significant increase of the write throughput by moving the Ceph journal to a faster location (in memory). Moreover, while the system scaled with the increasing number of clients operating the cluster, we noticed a slight performance degradation after the saturation point. We tested two optimisation strategies - increasing the available RAM or the object size - and noted a write throughput increase of up to 9% and 27%, respectively. Our findings improve the understanding of Ceph and should benefit future users through the presented strategies for tackling various performance limitations.