Skip to main content
SHARE
Publication

HPC Usage Behavior Analysis and Performance Estimation with Machine Learning Techniques...

by Hao Zhang, Haihang You, Bilel Hadri, Mark R Fahey
Publication Type
Conference Paper
Publication Date
Conference Name
18th International Conference on Parallel and Distributed Processing Techniques and Applications
Conference Location
Las Vegas, Nevada, United States of America
Conference Date
-

Most researchers with little high performance computing (HPC) experience have difficulties productively
using the supercomputing resources. To address this issue, we investigated usage behaviors of the world’s fastest academic Kraken supercomputer, and built a knowledge-based recommendation system to improve user productivity. Six clustering techniques, along with three cluster validation measures, were implemented to investigate the underlying patterns of usage behaviors. Besides manually defining a category for very large job submissions, six behavior categories were identified, which cleanly separated the data intensive jobs and computational intensive jobs. Then, job statistics of each behavior category were used to develop a knowledge-based recommendation system that can provide users with instructions about choosing appropriate software packages, setting job parameter values, and estimating job queuing time and runtime. Experiments were conducted to evaluate the performance of the proposed recommendation system, which included 127 job submissions by users from different research fields. Great feedback indicated the usefulness of the provided information. The average runtime estimation accuracy of 64.2%, with 28.9% job termination rate, was achieved in the experiments, which almost doubled the average accuracy in the Kraken dataset.