Reproducibility responsibilities in the HPC arena

by Mark R Fahey, Robert Mclay

Publication Type

Conference Paper

Publication Date

July, 2014

Conference Name

XSEDE14

Conference Location

Atlana, Georgia, United States of America

Conference Date

Jul 14, 2014

Abstract

Expecting bit-for-bit reproducibility in the HPC arena is not feasible because of the ever changing hardware and software. No user’s application is an island; it lives in an ”HPC” eco-system that changes over time. Old hardware stops working and even old software won’t run on new hardware. Further, software libraries change over time either by changing the internals or even interfaces. So bit-for-bit reproducibility should not be expected. Rather a reasonable expectation is that results are reproducible within error bounds; or that the answers are “close” (which is its own debate.)
To expect a researcher to reproduce their own results or the results of others within some error bounds, there must be enough information to recreate all the details of the experiment. This requires complete documentation of all phases of the researcher’s workflow; from code to versioning to programming and runtime environments to publishing of data. This argument is the core statement of the Yale 2009 Declaration on Reproducible Research [1]. Although the HPC ecosystem is often outside the researchers control, the application code could be built almost identically and there is a chance for “very similar” results with just only round-off error differences. To achieve complete documentation at every step, the researcher, the computing center, and the funding agencies all have a role. In this thesis, the role of the researcher is expanded upon as compared to the Yale report and the role of the computing centers is described.

Reproducibility responsibilities in the HPC arena

Abstract

Organizations