Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System...

Show authors

Publication Type

Conference Paper

Journal Name

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

Book Title

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

Publication Date

June, 2017

Page Numbers

1022 to 1031

Issue

Publisher Location

New Jersey, United States of America

Conference Name

2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)

Conference Location

Atlanta, Georgia, United States of America

Conference Sponsor

IEEE

Conference Date

Jun 5, 2017 - Aug 8, 2017

View DOI Listing

Abstract

With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.

Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System...

Abstract

Researchers

Organizations