Proteome-scale Deployment of Protein Structure Prediction Workflows on the Summit Supercomputer...

Show authors

Publication Type

Conference Paper

Book Title

Proceedings of 2022 IEEE 36th International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Publication Date

May, 2022

Page Numbers

206 to 215

Publisher Location

New Jersey, United States of America

Conference Name

21st IEEE International Workshop on High Performance Computational Biology

Conference Location

Lyons, France and Virtual, France

Conference Sponsor

IEEE

Conference Date

May 30, 2022

View DOI Listing

Abstract

Deep learning has contributed to major advances in the prediction of protein structure from sequence, a fundamental problem in structural bioinformatics. With predictions now approaching the accuracy of crystallographic experiments, and with accelerators like GPUs and TPUs making inference using large models rapid, genome-level structure prediction becomes an obvious aim. Leadership-class computing resources can be used to perform genome-scale protein structure prediction using state-of-the-art deep learning models, providing a wealth of new data for systems biology applications. Here we describe our efforts to efficiently deploy the AlphaFold v.2 program, for full-proteome structure prediction, at scale on the Oak Ridge Leadership Computing Facility's resources, including the Summit supercomputer. We performed inference to produce the predicted structures for 40,526 protein sequences, corresponding to four prokaryotic proteomes and one plant proteome, using under 4,400 total Summit node hours, equivalent to using the majority of the supercomputer for a little over one hour. We also designed an optimized structure refinement that reduced the time for the relaxation stage of the AlphaFold pipeline by over 10X for longer sequences. We demonstrate the types of analyses that can be performed on proteome-scale collections of sequences, including a search for novel quaternary structures and implications for functional annotation.

Proteome-scale Deployment of Protein Structure Prediction Workflows on the Summit Supercomputer...

Abstract

Researchers

Organizations