Skip to main content
SHARE
Publication

Filtering log data: finding the needles in the haystack...

Publication Type
Conference Paper
Publication Date
Conference Name
42nd IEEE/IFIP International Conference on Dependable Systems and Networks (DSN 2012)
Conference Location
Boston, Massachusetts, United States of America
Conference Sponsor
IEEE/IFIP
Conference Date
-

Log data is an incredible asset for troubleshooting
in large-scale systems. Nevertheless, due to the ever-growing
system scale, the volume of such data becomes overwhelming,
bringing enormous burdens on both data storage and data
analysis. To address this problem, we present a 2-dimensional
online filtering mechanism to remove redundant and noisy data
via feature selection and instance selection. The objective of
this work is two-fold: (i) to significantly reduce data volume
without losing important information, and (ii) to effectively
promote data analysis. We evaluate this new filtering mechanism
by means of real environmental data from the production
supercomputers at Oak Ridge National Laboratory and Sandia
National Laboratory. Our preliminary results demonstrate that
our method can reduce more than 85% disk space, thereby
significantly reducing analysis time. Moreover, it also facilitates
better failure prediction and diagnosis by more than 20%, as
compared to the conventional predictive approach relying on
RAS (Reliability, Availability, and Serviceability) events alone.