Optimizing Communication in 2D Grid-Based MPI Applications at Exascale

Show authors

Publication Type

Conference Paper

Book Title

EuroMPI '23: Proceedings of the 30th European MPI Users' Group Meeting

Publication Date

September, 2023

Page Numbers

1 to 11

Publisher Location

New York, New York, United States of America

Conference Name

EuroMPI 23: European MPI Users' Group Meeting

Conference Location

Bristol, United Kingdom

Conference Sponsor

University of Bristol

Conference Date

Sep 11, 2023 - Sep 13, 2023

View DOI Listing

Abstract

The new reality of exascale computing faces many challenges in achieving optimal performance on large numbers of nodes. A key challenge is the efficient utilization of the message-passing interface (MPI), a critical component for process communication. This paper explores communication optimization strategies to harness the GPU-accelerated architectures of these supercomputers. We focus on MPI applications where processors form a two-dimensional process grid, a common arrangement in applications involving dense matrix operations. This configuration offers a unique opportunity to implement innovative strategies to improve performance and maintain effective load distribution. We study two applications— Dist-FW (Apsp:all-pair-shortest-path) and HPL-MxP (LU factorization with Mixed precision)—on two accelerated systems: Summit (IBM Power and NVIDIA V100) and Frontier (AMD EPYC and MI250X). These supercomputers are operated by the Oak Ridge Leadership Computing Facility (OLCF) and are currently ranked #1 and #5 on the Top500 list. We show how to scale up both applications to exascale levels and tackle the MPI challenges related to implementation, synchronization, and performance. We also compare the performance of several communication strategies at an unprecedented scale. Accurately predicting application performance becomes crucial for cost reduction as the computational scale grows. To address this, we suggest a hyperbolic model as a better alternative to the traditional one-sided asymptotic model for predicting future application performance at such large scales.

Optimizing Communication in 2D Grid-Based MPI Applications at Exascale

Abstract

Researchers

Organizations