Abstract
The Locally Self-consistent Multiple Scattering (LSMS) code solves the
first principles Density Functional theory Kohn-Sham equation for a
wide range of materials with a special focus on metals, alloys and
metallic nano-structures. It has
traditionally exhibited near perfect scalability on massively parallel
high performance computer architectures. We present our efforts to
exploit GPUs to accelerate the LSMS code to enable first principles
calculations of O(100,000) atoms and statistical physics sampling of
finite temperature properties. We
reimplement the scattering matrix calculation for GPUs
with a block matrix inversion algorithm that only uses accelerator
memory.
Using the Cray XK7 system Titan at the
Oak Ridge Leadership Computing Facility we achieve a sustained
performance of 14.5PFlop/s and a speedup of 8.6 compared to the CPU
only code.