Abstract
In a traditional Partitioned Global Address Space language like UPC, an application programmer works with the model of a static set of threads performing locality-aware accesses on a global address space. On the other hand, asynchronous programming provides a simple interface for expressing the concurrency in dynamic, irregular algorithms, with the prospect of efficient portable execution from sophisticated runtime schemes handling the exposed concurrency. In this paper, we adopt the asynchronous style of programming to parallelize a nested, tree-based code in UPC. To maximize performance without losing the ease of application programming, we design Asynchronous Remote Methods as a potential extension to the UPC standard. Our prototype implementation of this construct in Berkeley UPC yields within 7% of ideal performance and 20-fold improvement over the original Standard UPC solution in
some cases.