Skip to main content
SHARE
Publication

Ensuring statistical reproducibility of ocean model simulations in the age of hybrid computing...

by Salil Mahajan
Publication Type
Conference Paper
Journal Name
Proceedings of the Platform for Advanced Scientific Computing Conference (PASC)
Book Title
PASC '21: Proceedings of the Platform for Advanced Scientific Computing Conference
Publication Date
Page Numbers
1 to 9
Publisher Location
New York, United States of America
Conference Name
Platform for Advanced Scientific Computing Conference (PASC 21)
Conference Location
Geneva, Switzerland
Conference Sponsor
ACM, sighpc,cscs
Conference Date
-

Novel high performance computing systems that feature hybrid architectures require large scale code refactoring to unravel underlying exploitable parallelism. Such redesign can often be accompanied with machine-precision changes as the order of computation cannot always be maintained. For chaotic systems like climate models, these round-off level differences can grow rapidly. Systematic errors may also manifest initially as machine-precision differences. Isolating genuine round off level differences from such errors remains a challenge. Here, we apply two-sample equality of distribution tests to evaluate statistical reproducibility of the ocean model component of US Department of Energy's Energy Exascale Earth System Model (E3SM). A 2-year control simulation ensemble is compared to a modified ensemble as a test case - after a known non-bit-for-bit change in a model component is introduced - to evaluate the null hypothesis that the two ensembles are statistically indistinguishable. To quantify the false negative rates of these tests, we conduct a formal power analysis using a targeted suite of short simulation ensembles. The ensemble suite contains several perturbed ensembles, each with a progressively different climate than the baseline ensemble - obtained by perturbing the magnitude of a single model tuning parameter, the Gent and McWilliams κ, in a controlled manner. The null hypothesis is evaluated for each of perturbed ensembles using these tests. The power analysis informs on the detection limits of the tests for given ensemble size allowing model developers to evaluate the impact of an introduced non-bit-for-bit change to the model.