Skip to main content
SHARE
Publication

A New Deadlock Resolution Protocol and Message Matching Algorithm for the Extreme-scale Simulator...

by Christian Engelmann, Thomas J Naughton Iii
Publication Type
Journal
Journal Name
Concurrency and Computation: Practice and Experience
Publication Date
Page Numbers
3369 to 3389
Volume
28
Issue
12

Investigating the performance of parallel applications at scale on future high-performance computing (HPC) architectures and the performance impact of different HPC architecture choices is an important component of HPC hardware/software co-design. The Extreme-scale Simulator (xSim) is a simulation toolkit for investigating the performance of parallel applications at scale. xSim scales to millions of simulated Message Passing Interface (MPI) processes. The overhead introduced by a simulation tool is an important performance and productivity aspect. This paper documents two improvements to xSim:
(1)~a new deadlock resolution protocol to reduce the parallel discrete event simulation overhead and
(2)~a new simulated MPI message matching algorithm to reduce the oversubscription management overhead. The results clearly show a significant performance improvement. The simulation overhead for running the NAS Parallel Benchmark suite was reduced from 102% to 0% for the embarrassingly parallel (EP) benchmark and from 1,020% to 238% for the conjugate gradient (CG) benchmark. xSim offers a highly accurate simulation mode for better tracking of injected MPI process failures. With highly accurate simulation, the overhead was reduced from 3,332% to 204% for EP and from 37,511% to 13,808% for CG.