Self-stabilizing Connected Components

by Piyush K Sao, Christian Engelmann, Srinivas Eswar, Oded Green, Richard Vuduc

Publication Type

Conference Paper

Book Title

2019 IEEE/ACM 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS)

Publication Date

November, 2019

Page Numbers

50 to 59

Conference Name

The 9th Workshop on Fault Tolerance for HPC at eXtreme Scale (FTXS) 2019

Conference Location

Denver, Colorado, United States of America

Conference Sponsor

ACM

Conference Date

Nov 22, 2019

View DOI Listing

Abstract

For the problem of computing the connected components of a graph, this paper considers the design of algorithms that are resilient to transient hardware faults, like bit Hips. More specifically, it applies the technique of self-stabilization. A system is self-stabilizing if, when starting from a valid or invalid state, it is guaranteed to reach a valid state after a finite number of steps. Therefore on a machine subject to a transient fault, a self-stabilizing algorithm could recover if that fault caused the system to enter an invalid state. We give a comprehensive analysis of the valid and invalid states during label propagation and derive algorithms to verify and correct the invalid state. The self-stabilizing label-propagation algorithm performs O (V log V) additional computation and requires O (V) additional storage over its conventional counterpart (and, as such, does not increase asymptotic complexity over conventional label propagation). When run against a battery of simulated fault injection tests, the self-stabilizing label propagation algorithm exhibits more resilient behavior than a triple modular redundancy (TMR) based fault-tolerant algorithm in 80% of cases. From a performance perspective, it also outperforms TMR as it requires fewer iterations in total. Beyond the fault tolerance properties of self-stabilizing label-propagation, we believe, they are useful from the theoretical perspective; and may have other use-cases.

Self-stabilizing Connected Components

Abstract

Researchers

Organizations