Experiences with High-Level Programming Directives for Porting Applications to GPUs

Show authors

Publication Type

Conference Paper

Book Title

Facing the Multicore - Challenge II

Publication Date

September, 2012

Page Numbers

96 to 107

Volume

7174

Conference Name

Facing the Multicore-Challenge II

Conference Location

Karlsruhe, Germany

Conference Date

Sep 28, 2011 - Sep 30, 2011

Abstract

HPC systems now exploit GPUs within their compute nodes to accelerate program
performance. As a result, high-end application development has become extremely
complex at the node level. In addition to restructuring the node code to
exploit the cores and specialized devices, the programmer may need to choose
a programming model such as OpenMP or CPU threads in conjunction with an
accelerator programming model to share and manage the difference node
resources. This comes at a time when programmer productivity and the ability to
produce portable code has been recognized as a major concern. In order to
offset the high development cost of creating CUDA or OpenCL kernels, directives
have been proposed for programming accelerator devices, but their implications
are not well known.
In this paper, we evaluate the state of the art accelerator directives to program several applications kernels,
explore transformations to achieve good performance, and examine the expressiveness and performance penalty of using high-level directives versus CUDA. We also compare our results to OpenMP implementations to understand the benefits of running the kernels in the accelerator
versus CPU cores.

Experiences with High-Level Programming Directives for Porting Applications to GPUs

Abstract

Researchers

Organizations