Skip to main content
SHARE
Publication

A Conceptual Framework for HPC Operational Data Analytics...

by Alessio Netti, Woong Shin, Michael Ott, Torsten Wilde, Natalie Bates
Publication Type
Conference Paper
Book Title
Proceedings of 2021 IEEE International Conference on Cluster Computing (CLUSTER)
Publication Date
Page Numbers
596 to 603
Publisher Location
United States of America
Conference Name
Energy Efficient HPC State of the Practice Workshop 2021 (CLUSTER 2021)
Conference Location
Portland, Oregon, United States of America
Conference Sponsor
Energy Efficient HPC Working Group (https://eehpcwg.llnl.gov)
Conference Date
-

This paper provides a broad framework for under- standing trends in Operational Data Analytics (ODA) for High- Performance Computing (HPC) facilities. The goal of ODA is to allow for the continuous monitoring, archiving, and analysis of near real-time performance data, providing immediately actionable information for multiple operational uses. In this work, we combine two models to provide a comprehensive HPC ODA framework: one is an evolutionary model of analytics capabilities that consists of four types, which are descriptive, diagnostic, predictive and prescriptive, while the other is a four- pillar model for energy-efficient HPC operations that covers facility, system hardware, system software, and applications. This new framework is then overlaid with a description of current development and production deployments of ODA within leading- edge HPC facilities. Finally, we perform a comprehensive survey of ODA works and classify them according to our framework, in order to demonstrate its effectiveness.