Exploring the Optimal Platform Configuration for Power-Constrained HPC Workflows

by Kun Tang, Xubin He, Saurabh Gupta, Sudharshan S Vazhkudai, Devesh Tiwari

Publication Type

Conference Paper

Journal Name

Proceedings of the 27th International Conference on Computer Communications and Networks

Book Title

2018 27th International Conference on Computer Communication and Networks (ICCCN)

Publication Date

August, 2018

Page Numbers

1 to 10

Issue

Publisher Location

New Jersey, United States of America

Conference Name

27th International Conference on Computer Communications and Networks (ICCCN 2018)

Conference Location

Hangzhou, China

Conference Sponsor

IEEE

Conference Date

Jul 30, 2018 - Aug 2, 2018

View DOI Listing

Abstract

In high-performance computing (HPC) workflows, data analytics is typically utilized to gain insights from scientific simulations. Approaching the era of exascale, online analysis is gaining popularity due to the savings of I/O to persistent storage. As computing capability keeps growing, power consumption is becoming critical to HPC facilities. Enforcing power limits is emerging as a practical trend for power-constrained HPC facilities. However, it remains unclear how to choose the appropriate power limits for various HPC workflows and how to distribute the power limit of a workflow between simulation and analysis. In addition, given a power limit, it is unclear what the optimal scales and power capping levels are for various workflows, especially when taking reliability into account. In order to resolve these issues in power-constrained HPC, in this paper, we propose a reliability-aware model to determine the aforementioned platform configurations for HPC workflows. We also validate our model and present model-driven studies for a wide range of real-system scenarios. Our study reveals interesting insights about how platform configuration affects the performance and energy efficiency of HPC workflows under power constraints.

Exploring the Optimal Platform Configuration for Power-Constrained HPC Workflows

Abstract

Organizations