Skip to main content
SHARE
Publication

Exploring the Optimal Platform Configuration for Power-Constrained HPC Workflows...

by Kun Tang, Xubin He, Saurabh Gupta, Sudharshan S Vazhkudai, Devesh Tiwari
Publication Type
Conference Paper
Journal Name
Proceedings of the 27th International Conference on Computer Communications and Networks
Book Title
2018 27th International Conference on Computer Communication and Networks (ICCCN)
Publication Date
Page Numbers
1 to 10
Issue
1
Publisher Location
New Jersey, United States of America
Conference Name
27th International Conference on Computer Communications and Networks (ICCCN 2018)
Conference Location
Hangzhou, China
Conference Sponsor
IEEE
Conference Date
-

In high-performance computing (HPC) workflows, data analytics is typically utilized to gain insights from scientific simulations. Approaching the era of exascale, online analysis is gaining popularity due to the savings of I/O to persistent storage. As computing capability keeps growing, power consumption is becoming critical to HPC facilities. Enforcing power limits is emerging as a practical trend for power-constrained HPC facilities. However, it remains unclear how to choose the appropriate power limits for various HPC workflows and how to distribute the power limit of a workflow between simulation and analysis. In addition, given a power limit, it is unclear what the optimal scales and power capping levels are for various workflows, especially when taking reliability into account. In order to resolve these issues in power-constrained HPC, in this paper, we propose a reliability-aware model to determine the aforementioned platform configurations for HPC workflows. We also validate our model and present model-driven studies for a wide range of real-system scenarios. Our study reveals interesting insights about how platform configuration affects the performance and energy efficiency of HPC workflows under power constraints.