Skip to main content
SHARE
Publication

IRIS: Exploring Performance Scaling of the Intelligent Runtime System and its Dynamic Scheduling Policies

Publication Type
Conference Paper
Book Title
2024 IEEE International Parallel and Distributed Processing Symposium Workshops (IPDPSW)
Publication Date
Page Numbers
58 to 67
Publisher Location
New Jersey, United States of America
Conference Name
The thirty-third Heterogeneity in Computing Workshop (HCW), in conjunction with IPDPS24
Conference Location
San Francisco, California, United States of America
Conference Sponsor
IEEE
Conference Date
-

High-Performance Computing is becoming increasingly heterogeneous, relying on a diverse mix of hardware to achieve good performance. Paradoxically, current drivers and frameworks for these devices typically require separate languages and implementations for each vendor. Furthermore, there are few tools and little support to schedule codes between these devices in a truly heterogeneous manner-partly because of this fragmentation between vendors and the languages each supports. To overcome both limitations, the Intelligent Runtime System (IRIS) was developed. It allows a common task abstraction to automatically be shared among contemporary vendors and is run from a single host-side API. At runtime, IRIS queries the host system and registers which frameworks and drivers are available, these determine which kernels can be used by the scheduler-CPUs via OpenMP, Nvidia GPUs (CUDA), AMD GPUs (HIP), and Intel and Xilinx FPGAs with OpenCL. IRIS enables tasks to be scheduled to any heterogeneous device and resolves to the appropriate kernel binary at runtimeit only uses the devices supported by the system on which it is run. IRIS supports single-task and graph-based expressions of dependencies of tasks. Additionally, IRIS features a range of dynamic scheduling policies, allowing complex chains of tasks and interactions to be executed, relieving the programmer/user from considering the system to assign tasks to devices optimally. This paper presents the peak performance attainable by IRIS over a range of systems-each with different numbers and types of accelerator devices, it highlights the flexibility of IRIS since these devices are truly heterogeneous, relying on different backends (drivers, frameworks, and languages) which historically required unique implementations to utilize them. We then use this peak performance as a baseline to compare increasingly complex chains of tasks (with increasingly complex task dependencies) and evaluate how IRIS copes. Finally, we consider the performance of different IRIS scheduling policies on this range of task graphs.