Skip to main content
SHARE
Publication

Characterizing the Performance of Executing Many-tasks on Summit...

by Turilli Matteo, Andre Merzky, Thomas J Naughton Iii, Wael R Elwasif, Shantenu Jha
Publication Type
Conference Paper
Book Title
2019 IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systems and Middleware (IPDRM)
Publication Date
Page Numbers
18 to 25
Publisher Location
New York, United States of America
Conference Name
IEEE/ACM Third Annual Workshop on Emerging Parallel and Distributed Runtime Systms and Middleware (IPDRM)
Conference Location
Denver, Colorado, United States of America
Conference Sponsor
IEE/ACM
Conference Date
-

Many scientific workloads are comprised of many tasks, where each task is an independent simulation or analysis of data. The execution of millions of tasks on heterogeneous HPC platforms requires scalable dynamic resource management and multi-level scheduling. RADICAL-Pilot (RP) -- an implementation of the Pilot abstraction, addresses these challenges and serves as an effective runtime system to execute workloads comprised of many tasks. In this paper, we characterize the performance of executing many tasks using RP when interfaced with JSM and PRRTE on Summit: RP is responsible for resource management and task scheduling on acquired resource; JSM or PRRTE enact the placement of launching of scheduled tasks. Our experiments provide lower bounds on the performance of RP when integrated with JSM and PRRTE. Specifically, for workloads comprised of homogeneous single-core, 15 minutes-long tasks we find that: PRRTE scales better than JSM for > O(1000) tasks; PRRTE overheads are negligible; and PRRTE supports optimizations that lower the impact of overheads and enable resource utilization of 63% when executing O(16K), 1-core tasks over 404 compute nodes.