Skip to main content
SHARE
Publication

Comprehensive Measurement and Analysis of the User-Perceived I/O Performance in a Production Leadership-Class Storage System...

Publication Type
Conference Paper
Journal Name
2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)
Book Title
2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)
Publication Date
Page Numbers
1022 to 1031
Issue
0
Publisher Location
New Jersey, United States of America
Conference Name
2017 IEEE 37th International Conference on Distributed Computing Systems (ICDCS)
Conference Location
Atlanta, Georgia, United States of America
Conference Sponsor
IEEE
Conference Date
-

With the increase of the scale and intensity of the parallel I/O workloads generated by those scientific applications running on high performance computing facilities, understanding the I/O dynamics, especially the root cause of the I/O performance variability and degradation in HPC environment, have become extremely critical to the HPC community. In this paper, we run extensive I/O measuring tests on a production leadership-class storage system to capture the performance variabilities of large-scale parallel I/O. Analyzing these results and its statistic correlation revealed some valuable insights into the characteristics of the storage system and the root cause of I/O performance variability. Further, we leverage these findings and propose an I/O middleware design refactoring which can improve the performance of the parallel I/O by optimizing the data striping and placement. Our preliminary evaluation results demonstrate the proposed approach can reduce the average per-process write latency by at least 80% and the maximum per-process write latency by at least 20%.