Abstract
Application performance models provide insight to designers of high performance computing (HPC) systems on the role of subsystems such as the processor or the network in determining application performance and allow HPC centers to more accurately target procurements to resource requirements. Performance models can also be used to identify application performance bottlenecks and to educate users on scalability issues. The suitability of a performance model, however, for a particular performance investigation is a function of both the accuracy and the cost of the model.
A semi-empirical model developed in an earlier publication for an astrophysics application was shown to be inaccurate when predicting communication cost for large numbers of processors. It was hypothesized that this deficiency was due to the inability of the model to adequately capture communication contention (threshold effects) as well as
other unmodeled components such as noise and I/O contention. In this paper we present a new approach to capture these unknown features to improve the predictive capabilities of the model. We adopted a black-box model error correction procedure that uses evolutionary algorithms to find an error correction component to augment the existing model. Four variations of this procedure were investigated and all were shown to produce improved results than the old model.