Skip to main content
SHARE
Publication

Scaling Ensembles of Data-Intensive Quantum Chemical Calculations for Millions of Molecules

Publication Type
Conference Paper
Book Title
2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Publication Date
Page Numbers
1047 to 1056
Publisher Location
New Jersey, United States of America
Conference Name
The 25th IEEE International Workshop on Parallel and Distributed Scientific and Engineering Computing (PDSEC 2024)
Conference Location
San Francisco, California, United States of America
Conference Sponsor
IEEE
Conference Date

Deep learning models are efficient computational tools that can accelerate the inverse design of molecules with desired functional properties by generating predictions at a fraction of the time required by traditional quantum chemical approaches. To ensure that a model maintains accuracy and transferability across broad regions of the chemical space explored during the inverse design, it must be trained on massively large volumes of simulation data. This requires running large-scale ensemble quantum chemical calculations on high-performance computing (HPC) systems for data collection. However, the efficient execution of such large ensemble calculations and the management of large volumes of output data require tools that can judiciously utilize computational resources and manage metadata overhead on the file system. Therefore, we present a high-performance, scalable, ensemble management framework for performing data-intensive quantum chemical electronic structure calculations for organic molecules. This framework provides abstractions to plug different ab initio, first principles, and first principles-based semi-empirical methods and executes them efficiently at large scale on HPC systems. It dynamically distributes tasks to resources and uses tiered storage for managing large collections of files. We employed this framework to process over ten million organic molecules and generate open-source datasets that provide UV-vis absorption spectra by running time-dependent density-functional tight-binding calculations. It is the largest database containing molecular optical spectra that were simulated with quantum chemical methods in a consistent manner.