Skip to main content
SHARE
Publication

Toward designing effective exascale scientific computing workflows: experiences and best practices...

by Mark A Coletti, Russell B Davidson, Ada A Sedova
Publication Type
Conference Paper
Book Title
PROCEEDINGS OF THE 2022 IMPROVING SCIENTIFIC SOFTWARE CONFERENCE
Publication Date
Page Numbers
1 to 12
Publisher Location
Boulder, Colorado, United States of America
Conference Name
SEA's Improving Scientific Software Conference (SEA 2022)
Conference Location
Boulder, Colorado, United States of America
Conference Sponsor
UCAR and NCAR
Conference Date
-

Many fields within scientific computing have embraced advances in big-data analysis and machine learning, which often requires the deployment of large, distributed and complicated workflows that may combine training neural networks, performing simulations, running inference, and performing database queries and data analysis in asynchronous, parallel and pipelined execution frameworks. Such a shift has brought into focus the need for scalable, efficient workflow management solutions with reproducibility, error and provenance handling, traceability, and checkpoint-restart capabilities, among other needs. Here, we discuss challenges and best-practices for deploying exascale-generation computational science workflows on resources at the Oak Ridge Leadership Computing Facility (OLCF). We present our experiences with large-scale deployment of distributed workflows on the Summit supercomputer, including for bioinformatics and computational biophysics, materials science, and deep learning model optimization. We also present problems and solutions created by working within a Python-centric software base on traditional HPC systems, and discuss steps that will be required before the convergence of HPC, AI, and data science can be fully realized. Our results point to a wealth of exciting new possibilities for harnessing this convergence to tackle new scientific challenges.