Skip to main content
SHARE
Publication

MPI Support for Multi-core Architectures: Optimized Shared Memory Collectives...

by Richard L Graham, Galen M Shipman
Publication Type
Conference Paper
Journal Name
Lecture Notes in Computer Science
Publication Date
Page Numbers
130 to 140
Volume
5205
Conference Name
Proceedings of the 15th European PVM/MPI Users' Group Meeting on Recent Advances in Parallel Virtual Machine and Message Passing Interface
Conference Location
Dublin, Ireland
Conference Date

With local core counts on the rise, taking advantage of shared memory to optimize collective operations can improve performance. We study several on-host shared memory optimized algorithms for MPI Bcast,
MPI Reduce, and MPI Allreduce, using tree-based, and reduce-scatter algorithms. For small data operations with relatively large synchronization
costs fan-in/fan-out algorithms generally perform best. For large
messages data manipulation constitute the largest cost and reduce-scatter
algorithms are best for reductions. These optimization improve performance
by up to a factor of three. Memory and cache sharing effect require
deliberate process layout and careful radix selection for tree-based
methods