Abstract
We describe the CoMet application for large-scale epistatic Genome-Wide Association Studies (eGWAS) and pleiotropy studies. High performance is attained by transforming the underlying vector comparison methods into highly performant generalized distributed dense linear algebra operations. The 2-way and 3-way Proportional Similarity metric and Custom Correlation Coefficient are implemented using native or adapted GEMM kernels optimized for GPU architectures. By aggressive overlapping of communications, transfers and computations, high efficiency with respect to single GPU kernel performance is maintained up to the full Titan and Summit systems. Nearly 300 quadrillion element comparisons per second and over 2.3 mixed precision ExaOps are reached on Summit by use of Tensor Core hardware on the Nvidia Volta GPUs. Performance is four to five orders of magnitude beyond comparable state of the art. CoMet is currently being used in projects ranging from bioenergy to clinical genomics, including for the genetics of chronic pain and opioid addiction.