Skip to main content
SHARE
Publication

On the effective implementation of a boundary element code on graphics processing units using an out-of-core LU algorithm...

by Eduardo F D'azevedo, Sylvain Nintcheu Fata
Publication Type
Journal
Journal Name
Engineering Analysis with Boundary Elements
Publication Date
Page Numbers
1246 to 1255
Volume
36
Issue
8
A collocation boundary element code for solving the three-dimensional Laplace equation, publicly available from \url{http://www.intetec.org}, has been adapted to run on an Nvidia Tesla general purpose graphics processing unit (GPU). Global matrix assembly and LU factorization of the resulting dense matrix were performed on the GPU. Out-of-core techniques were used to solve problems larger than available GPU memory. The code achieved over eight times speedup in matrix assembly and about 56~Gflops/sec in the LU factorization using only 512~Mbytes of GPU memory. Details of the GPU implementation and comparisons with the standard sequential algorithm are included to illustrate the performance of the GPU code.