Optimizing Blocking and Nonblocking Reduction Operations for Multicore Systems: Hierarchical Design and Implementation

This work proposed a design for implementing blocking and non-blocking reduction collective operations for modern multicore systems. An implementation based on the design performed an order of magnitude better than the state of-the-art on variety of systems including Cray and InfiniBand systems. The Conjugate Gradient solver using this implementation completed over 195% faster, compared to the completion time while using the state-of-the-art. These reduction implementations are integrated into Open MPI, a popular implementation of MPI standard, and we expect to release these implementations publicly as part of future Open MPI release. A paper describing the design, implementation, and evaluation of these reductions is accepted to be published in IEEE Cluster 2013 conference proceedings.

Team Members: Manjunath Gorentla Venkata and Pavel Shamis and Richard L. Graham and Joshua S. Ladd and Rahul Sampath


We're always happy to get feedback from our users. Please use the Comments form to send us your comments, questions, and observations.