Skip to main content
SHARE
Research Highlight

Leveraging Advanced AI Machinery to Detect Intelligent Attacks in CPS

Achievement: A multidisciplinary team of researchers from Virginia Polytechnic Institute and State University (Virginia Tech) and Oak Ridge National Laboratory (ORNL) propose a deep learning-based intrusion detection framework, CANShield, to detect advanced and stealthy attacks from high-dimensional signal-level controller area network (CAN) data, a de facto automobile communication standard. It features a data processing technique (pipeline) for the high dimensional CAN signal stream by creating a temporary data queue and use the forward filling mechanism to fill the missing data. This pipeline prepares a data stream suitable for the training and testing in the ML-based IDS. Evaluation on real-world attack datasets shows CANShield’s robustness and responsiveness against different advanced attacks.

Significance and Impact: As modern vehicles have hundreds of ECUs, they contain a lot of CAN IDs and numerous associated signals. Securing all of them with IDS comes with great implementation and computation costs. On the other hand, securing only a handful of important signals from the critical sub-system of the vehicle, such as power train, engine, coolant system, etc., will reduce complexity and render feasible solutions for real-time detection. A practical challenge arises in designing an effective detection pipeline with a selected group of signals. Accordingly, we consider CANShield to keep tracks on only m pre-selected high priority signals. To find the shortlisted signals, we assume that the defender has the semantic knowledge of the critical signals. To make the detection more effective and robust CANShield adds additional signals based on the correlation coefficient, starting from the ones highly correlated with the critical signals. However, adding too many signals will lead to an expensive and ineffective system. Therefore, $m$ is a design parameter and depends on the defender. For the rest of the paper, we will use the term “signals” to indicate only the pre-selected $m$ signals.

The order of the signals in the created 2D input image could also impact the learning efficacy. Comparing to a random placement, placements that bring out stronger spatial (correlations) patterns of the signals in the resulting image will enable more effective learning. To facilitate the learning of the inter-sensor correlations, CANShield calculates the Pearson correlation matrix of the time-series signal dataset. Interpreting the correlation coefficient as the distance between a pair of signals, CANShield utilizes hierarchical agglomerative clustering algorithm to find the clusters of highly correlated signals. The goal is to place the highly correlated signals together while building the 2D image so that learning the signal-to-signal correlation becomes effective for the small filters of the convolutional layers. Notably, the two tasks, signal selection and correlation-based clustering, are done only once during the initialization of the training process (i.e., off-line with recorded data) and are not parts of the deployment pipeline.

Research Details

  • To make the multidimensional signal-level time series data suitable for the convolution neural network (CNN)-based model, we convert the two-dimensional data queues to multiple images and consider the detection as a computer vision-like problem. Multiple CNN-based autoencoder (AE) models learn the various temporal (short-term and long-term) and spatial (signal-wise) dependencies. Violations in either the temporal or spatial pattern can be detected during the reconstruction process.
  • We propose a three-step analysis of the reconstruction loss of CANShield’s AE models on selection of detection thresholds for the optimal accuracy, followed by an ensemble-based detector that boosts up the overall detection performance by combining the insights from all the AEs.
  • We evaluate CANShield against advanced signal-level attacks using real-world attack datasets and compare the results with a baseline model to show the improvements. The results show high effectiveness and responsiveness of CANShield against a wide range of fabrication, masquerade, and suspension attacks on CAN bus.

Facility: Real-world attack datasets were generated at the National Transportation Research Center at ORNL.

Sponsor/Funding: DOE ASCR

Team: Md Hasan Shahriar, Yang Xiao, Wenjing Lou, Thomas Hou (Virginia Tech), Pablo Moriano (ORNL)

Citation and DOI: Md Hasan Shahriar, Yang Xiao, Pablo Moriano, Wenjing Lou, Thomas Hou, CANShield: Signal-based Intrusion Detection for Controller Area Networks, Embedded Security in Cars (ESCAR) USA, 2022, DOI: https://doi.org/10.48550/arXiv.2205.01306 
 

AI CPS Signal-Based Detection Highlight
CANShield workflow. The tasks with AEs and thresholds differ during “training” and “deployment” phases.

Summary: Modern vehicles rely on a fleet of electronic control units (ECUs) connected through controller area network (CAN) buses for critical vehicular control. However, with the expansion of advanced connectivity features in automobiles and the elevated risks of internal system exposure, the CAN bus is increasingly prone to intrusions and injection attacks. The ordinary injection attacks disrupt the typical timing properties of the CAN data stream, and the rule-based intrusion detection systems (IDS) can easily detect them. However, advanced attackers can inject false data to the time series sensory data (signal), while looking innocuous by the pattern/frequency of the CAN messages. Such attacks can bypass the rule-based IDS or any anomaly-based IDS built on binary payload data. To make the vehicles robust against such intelligent attacks, we propose CANShield, a signal-based intrusion detection framework for the CAN bus. CANShield consists of three modules: a data preprocessing module that handles the high-dimensional CAN data stream at the signal level and makes them suitable for a deep learning model; a data analyzer module consisting of multiple deep autoencoder (AE) networks, each analyzing the time-series data from a different temporal perspective; and finally an attack detection module that uses an ensemble method to make the final decision. Evaluation results on two high-fidelity signal-based CAN attack datasets show the high accuracy and responsiveness of CANShield in detecting wide-range of advanced intrusion attacks.