Skip to main content
SHARE
Publication

Slow Nodes Cost Your Users Valuable Resources. Can You Find Them?...

by Ricky A Kendall, Don E Maxwell, Jeff Becklehimer, Cathy Willis
Publication Type
Conference Paper
Publication Date
Conference Name
Cray User Group: Compute the Future
Conference Location
Atlanta, Georgia, United States of America
Conference Sponsor
Cray User Group
Conference Date

Many High Performance Computing applications have a static load balance which is easy and cheap to implement. When one or a few nodes are not performing properly this makes the whole code slow down to the rate limiting performance of the slowest node. We describe the utilization of a coded called Bugget which has been used on Catamount and the Cray Linux Environment to quickly identify these nodes so they can be removed from the user pool until the next appropriate maintenance period.