Skip to main content
SHARE
Publication

Overlapping Computation and Communication: Barrier Algorithms and ConnectX-2 CORE-Direct Capabilities...

Publication Type
Conference Paper
Publication Date
Conference Name
24th IEEE International Parallel and Distributed Processing Symposium (IPDPS 2010)
Conference Location
Atlanta, Georgia, United States of America
Conference Date

This paper explores the computation and communication overlap capabilities enabled by the new CORE-Direct hardware capabilities introduced in the InfiniBand (IB) Host Channel Adapter (HCA) ConnectX-2. These capabilities enable the progression and completion of data-dependent communications sequences to progress and complete at the network level without any Central Processing Unit (CPU) involvement. We use the latency dominated nonblocking barrier algorithm in this study, and find that at 64 process count, a contiguous time slot of about 80 percent of the nonblocking barrier time is available for computation. This time slot increases as the number of processes participating increases. In contrast, CPU based implementations provide a time slot of up to 30 percent of the nonblocking barrier time. This bodes well for the scalability of simulations employing offloaded collective operations These capabilities can be used to reduce the effects of system noise, and when using nonblocking collective operations have the potential to hide the effects of application load imbalance.