Skip to main content
SHARE
Publication

Detecting Outliers in Network Transfers with Feature Extraction...

by Nageswara S Rao, Mariam Kiran, Cong Wang, Anirban Mandal
Publication Type
Conference Paper
Book Title
Proceedings of DISE1: Joint Workshop on Deep (or Machine) Learning for Safety-Critical Applications in Engineering
Publication Date
Conference Name
DISE1: Joint Workshop on Deep (or Machine) Learning for Safety-Critical Applications in Engineering
Conference Location
Stockholm, Sweden
Conference Sponsor
ICML
Conference Date
-

Reliable file transfers are essential for successful science computations and experiments that need large complex data files to be moved across long distances. Networks use protocols such as TCP and UDP to support file transfers, and their performance degrades under packet losses and duplications, thereby adversely affecting science applications. Hence, scientists and engineers often monitor network statistics to find and repair network problems that cause such degradations.

Outliers or anomalies can be detected using statistical and machine learning methods that highlight abnormal behaviors. However, without knowing what is normal, it is very difficult and subjective to determine the anomalies. In this paper, we investigate statistical and machine learning approaches to extract unknown or unsupervised features from TCP and perfSONAR data sets, with the aim to identify anomalies such as packet loss, duplication and retransmission sequencing that affect file transfer performance. Simple statistical and feature extraction methods (e.g. PCA) have shown that detecting anomalies is simpler if labeled data sets of normal behavior are available. While perfSONAR logs record losses,
they do not detect duplications and reordering. We show that PCA applied to TCP statistics is able to determine abnormal behavior in the above cases. In particular, our results show that simple feature extraction techniques provide building blocks for more advanced network monitoring tools that can quickly flag anomalies using simple measurements, thereby helping to build confidence in file transfer reliability.