Skip to main content

Network Offoaded Hierarchical Collectives Using ConnectX-2's CORE-Direct Capabilities...

by Ishai Rabinovitz, Pavel Shamis, Richard L Graham, Noam Bloch, Gilad Shainer
Publication Type
Conference Paper
Publication Date
Conference Name
EuroMPI 2010
Conference Location
Stuttgart, Germany
Conference Date
As the scale of High Performance Computing (HPC) systems continues to increase, demanding that we extract even more parallelism from applications, the need to move communication management away from the Central Processing Unit (CPU) becomes even greater. Moving this management to the network, frees up CPU cycles for computation, making it possible to overlap computation and communication. In this paper we continue to investigate how to best use the new CORE-Direct support added in the ConnectX-2 Host Channel Adapter (HCA) for creating high performance, asynchronous collective operations that are managed by the HCA. Specifically we consider the network topology, creating a two-level communication hierarchy, reducing the MPI Barrier completion time by 45%, from 26.59 microseconds, when not considering network topology, to 14.72 microseconds, with the CPU based collective barrier operation completing in 19.04 microseconds. The nonblocking barrier algorithm has similar performance, with about 50% of that time available for computation.