MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives...

by Richard L Graham, Galen M Shipman

Publication Type

Conference Paper

Journal Name

Lecture Notes in Computer Science

Publication Date

September, 2008

Page Numbers

130 to 140

Volume

5205

Conference Name

Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface

Conference Location

Dublin, Ireland

Conference Date

Sep 7, 2008

Abstract

With local core counts on the rise, taking advantage of shared memory to optimize collective operations can improve performance. We study several on-host shared memory optimized algorithms for MPI Bcast,
MPI Reduce, and MPI Allreduce, using tree-based, and reduce-scatter algorithms. For small data operations with relatively large synchronization
costs fan-in/fan-out algorithms generally perform best. For large
messages data manipulation constitute the largest cost and reduce-scatter
algorithms are best for reductions. These optimization improve performance
by up to a factor of three. Memory and cache sharing effect require
deliberate process layout and careful radix selection for tree-based
methods

MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives...

Abstract

Organizations