Skip to main content
SHARE
Publication

Monitoring Extreme-scale Lustre Toolkit...

by Michael J Brim, Joshua K Lothian
Publication Type
Conference Paper
Publication Date
Conference Name
International Workshop on the Lustre Ecosystem: Challenges and Opportunities
Conference Location
Annapolis, Maryland, United States of America
Conference Sponsor
Oak Ridge National Laboratory, U.S. Department of Defense
Conference Date
-

We discuss the design and ongoing development of the Monitoring Extreme-scale Lustre Toolkit (MELT), a unified Lustre performance monitoring and analysis infrastructure that provides continuous, low-overhead summary information on the health and performance of Lustre, as well as on-demand, in-depth problem diagnosis and root-cause analysis. The MELT infrastructure leverages a distributed overlay network to enable monitoring of center-wide Lustre filesystems where clients are located across many network domains. We preview interactive command-line utilities that help administrators and users to observe Lustre performance at various levels of resolution, from individual servers or clients to whole filesystems, including job-level reporting. Finally, we discuss our future plans for automating the root-cause analysis of common Lustre performance problems.