Improving Large-scale Storage System Performance via Topology-aware and Balanced Data Placement...

by Feiyi Wang, Hakki S Oral, Saurabh Gupta, Devesh Tiwari, Sudharshan S Vazhkudai

Publication Type

Conference Paper

Publication Date

December, 2014

Page Numbers

656 to 663

Conference Name

The 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014)

Conference Location

Hsinchu, Taiwan

Conference Sponsor

IEEE

Conference Date

Dec 16, 2014 - Dec 19, 2014

Abstract

With the advent of big data, the I/O subsystems of large-scale compute
clusters are becoming a center of focus, with more
applications putting greater demands on end-to-end I/O performance. These
subsystems are often complex in design. They comprise of multiple hardware and
software layers to cope with the increasing capacity, capability and scalability
requirements of data intensive applications. The sharing nature of storage
resources and the intrinsic
interactions across these layers make it to realize user-level, end-to-end
performance gains a great challenge.

We propose a topology-aware resource load balancing strategy to improve
per-application I/O performance. We demonstrate the effectiveness of
our algorithm on an extreme-scale compute cluster, Titan, at the Oak Ridge
Leadership Computing Facility (OLCF). Our experiments with both synthetic
benchmarks and a real-world application show that, even under congestion, our
proposed algorithm can improve large-scale application I/O performance
significantly, resulting in both the reduction of application run times and
higher resolution simulation runs.

Improving Large-scale Storage System Performance via Topology-aware and Balanced Data Placement...

Abstract

Researchers

Organizations