Skip to main content
SHARE
Publication

Characterizing Application Runtime Behavior from System Logs and Metrics...

by Raghul Gunasekaran, David A Dillow, Galen M Shipman, Richard Vuduc, Edmond Chow
Publication Type
Conference Paper
Publication Date
Conference Name
Workshop on Characterizing Applications for Heterogeneous Exascale Systems
Conference Location
Tucson, Arizona, United States of America
Conference Sponsor
ACM
Conference Date

Large-scale systems are heavily shared resource environments where a mix of applications are launched concurrently competing for network and storage resources. It is essential to characterize the runtime behavior of these applications for provisioning system resources and understanding the impact of application’s performance when competing for resources. In this paper, we study the use of zero- and low-overhead system logs and other system metric data for characterizing the runtime behavior of several applications. We present our preliminary work on estimating individual application’s I/O demands by observing file system usage pattern over multiple runs, and interpreting application’s network utilization characteristics by observing link-layer error logs. We also present preliminary findings on using such information in making context-sensitive scheduling decisions that minimize potentially negative interactions between applications competing for shared resources. Our analysis is based on four months of system log data collected on one of the world’s
largest supercomputing facilities, the Jaguar XT5 petaflop system at Oak Ridge National Laboratory.