Abstract
Ceph is an open-source and emerging parallel distributed file and storage
system technology. By design, Ceph assumes running on unreliable and commodity
storage and network hardware and provides reliability and fault-tolerance
through controlled object placement and data replication.
We evaluated the Ceph technology for scientific high-performance computing (HPC) environments. This paper presents our evaluation methodology, experiments, results and observations from mostly parallel I/O performance and scalability perspectives. Our work made two unique contributions. First, our evaluation is performed under a realistic setup for a large-scale capability HPC environment using a commercial
high-end storage system.
Second, our path of investigation, tuning efforts, and findings made direct
contributions to Ceph's development and improved code quality, scalability, and
performance. These changes should also benefit both Ceph and HPC communities at
large. Throughout the evaluation, we observed that Ceph still is an evolving
technology under fast-paced development and showing great promises.