IRIS: Exploring Performance Scaling of the Intelligent Runtime System and its Dynamic Scheduling Policies

Show authors

Publication Type

Conference Paper

Book Title

2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)

Publication Date

May, 2024

Page Numbers

58 to 67

Publisher Location

New Jersey, United States of America

Conference Name

The thirty-third Heterogeneity in Computing Workshop (HCW), in conjunction with IPDPS24

Conference Location

San Francisco, California, United States of America

Conference Sponsor

IEEE

Conference Date

May 27, 2024 - May 31, 2024

View DOI Listing

Abstract

High-Performance Computing is becoming increasingly heterogeneous, relying on a diverse mix of hardware to achieve good performance. Paradoxically, current drivers and frameworks for these devices typically require separate languages and implementations for each vendor. Furthermore, there are few tools and little support to schedule codes between these devices in a truly heterogeneous manner-partly because of this fragmentation between vendors and the languages each supports. To overcome both limitations, the Intelligent Runtime System (IRIS) was developed. It allows a common task abstraction to automatically be shared among contemporary vendors and is run from a single host-side API. At runtime, IRIS queries the host system and registers which frameworks and drivers are available, these determine which kernels can be used by the scheduler-CPUs via OpenMP, Nvidia GPUs (CUDA), AMD GPUs (HIP), and Intel and Xilinx FPGAs with OpenCL. IRIS enables tasks to be scheduled to any heterogeneous device and resolves to the appropriate kernel binary at runtimeit only uses the devices supported by the system on which it is run. IRIS supports single-task and graph-based expressions of dependencies of tasks. Additionally, IRIS features a range of dynamic scheduling policies, allowing complex chains of tasks and interactions to be executed, relieving the programmer/user from considering the system to assign tasks to devices optimally. This paper presents the peak performance attainable by IRIS over a range of systems-each with different numbers and types of accelerator devices, it highlights the flexibility of IRIS since these devices are truly heterogeneous, relying on different backends (drivers, frameworks, and languages) which historically required unique implementations to utilize them. We then use this peak performance as a baseline to compare increasingly complex chains of tasks (with increasingly complex task dependencies) and evaluate how IRIS copes. Finally, we consider the performance of different IRIS scheduling policies on this range of task graphs.

IRIS: Exploring Performance Scaling of the Intelligent Runtime System and its Dynamic Scheduling Policies

Abstract

Researchers

Organizations