Skip to main content
SHARE
Publication

Improving Large-scale Storage System Performance via Topology-aware and Balanced Data Placement...

by Feiyi Wang, Hakki S Oral, Saurabh Gupta, Devesh Tiwari, Sudharshan S Vazhkudai
Publication Type
Conference Paper
Publication Date
Page Numbers
656 to 663
Conference Name
The 20th IEEE International Conference on Parallel and Distributed Systems (ICPADS 2014)
Conference Location
Hsinchu, Taiwan
Conference Sponsor
IEEE
Conference Date
-

With the advent of big data, the I/O subsystems of large-scale compute
clusters are becoming a center of focus, with more
applications putting greater demands on end-to-end I/O performance. These
subsystems are often complex in design. They comprise of multiple hardware and
software layers to cope with the increasing capacity, capability and scalability
requirements of data intensive applications. The sharing nature of storage
resources and the intrinsic
interactions across these layers make it to realize user-level, end-to-end
performance gains a great challenge.

We propose a topology-aware resource load balancing strategy to improve
per-application I/O performance. We demonstrate the effectiveness of
our algorithm on an extreme-scale compute cluster, Titan, at the Oak Ridge
Leadership Computing Facility (OLCF). Our experiments with both synthetic
benchmarks and a real-world application show that, even under congestion, our
proposed algorithm can improve large-scale application I/O performance
significantly, resulting in both the reduction of application run times and
higher resolution simulation runs.