Abstract
The field of electronic structure is struggling to get efficient parallel implementation on Petascale
class hardware. One notable exception has been the achievement of Qbox, a planewave
pseudopotential electronic structure code that obtained a performance of 207 TFlops on a
BlueGene/L computer.
Qbox makes use of the message-passing MPI library for parallelization. Instead, NWChem
makes use of the Global Arrays library; this allows the software developer to reach a high level
of abstraction and, at the same time, to use one-sided communication to efficiently exploit
the network hardware. In the remainder of the paper, we will discuss recent benchmarks
and scientic results obtained with NWChem on a parallel computer whose theoretical peak
performance is in excess of 1 PFlops.