Skip to main content

Scalable Out-of-Core Solvers on Xeon Phi Cluster...

by Eduardo F D'azevedo, Ki Chan, Shiquan Su, Kwai L Wong
Publication Type
Book Chapter
Publication Date
Page Numbers
443 to 455
Publisher Name
Morgan Kaufmann
Publisher Location
Waltham, Massachusetts, United States of America

This paper documents the implementation of a distributive out-of-core
(OOC) solver for performing LU and Cholesky factorizations of a large dense
matrix on clusters of many-core programmable co-processors. The out-of-
core algorithm combines both the left-looking and right-looking schemes
aimed to minimize the movement of data between the CPU host and the
co-processor, optimizing data locality as well as computing throughput. The
OOC solver is built to align with the format of the ScaLAPACK software
library, making it readily portable to any existing codes using ScaLAPACK.
A runtime analysis conducted on Beacon (an Intel Xeon plus Intel Xeon Phi
cluster which composed of 48 nodes of multi-core CPU and MIC) at the Na-
tional Institute for Computational Sciences is presented. Comparison of the
performance on the Intel Xeon Phi and GPU clusters are also provided.