Skip to main content
SHARE
Publication

KokkACC: Enhancing Kokkos with OpenACC

by Pedro Valero Lara, Seyong Lee, Marc Gonzalez Tallada, Jeffrey S Vetter, Joel E Denny
Publication Type
Conference Paper
Book Title
2022 Workshop on Accelerator Programming Using Directives (WACCPD)
Publication Date
Page Numbers
32 to 42
Publisher Location
New Jersey, United States of America
Conference Name
Workshop on Accelerator Programming and Directives (WACCPD 2022)
Conference Location
Dallas, Texas, United States of America
Conference Sponsor
The International Conference for High Performance Computing, Networking, Storage and Analysis
Conference Date

Template metaprogramming is gaining popularity as a high-level solution for achieving performance portability on heterogeneous computing resources. Kokkos is a representative approach that offers programmers high-level abstractions for generic programming while most of the device-specific code generation and optimizations are delegated to the compiler through template specializations. For this, Kokkos provides a set of device-specific code specializations in multiple back ends, such as CUDA and HIP. Unlike CUDA or HIP, OpenACC is a high-level and directive-based programming model. This descriptive model allows developers to insert hints (pragmas) into their code that help the compiler to parallelize the code. The compiler is responsible for the transformation of the code, which is completely transparent to the programmer. This paper presents an OpenACC back end for Kokkos: KokkACC. As an alternative to Kokkos’s existing device-specific back ends, KokkACC is a multi-architecture back end providing a high-productivity programming environment enabled by OpenACC’s high-level and descriptive programming model. Moreover, we have observed competitive performance; in some cases, KokkACC is faster (up to 9×) than NVIDIA’s CUDA back end and much faster than OpenMP’s GPU offloading back end. This work also includes implementation details and a detailed performance study conducted with a set of mini-benchmarks (AXPY and DOT product) and three mini-apps (LULESH, miniFE and SNAP, a LAMMPS proxy mini-app).