Deep Mining: Detecting Anomalous Patterns in Neural Network Activations with Subset Scanning

This work views neural networks as data generating systems and applies anomalous pattern detection techniques on that data in order to detect when a network is processing a group of anomalous inputs. Detecting anomalies is a critical component for multiple machine learning problems including detecting the presence of adversarial noise added to inputs. More broadly, this work is a step towards giving neural networks the ability to detect groups of out-of-distribution samples. This work introduces “Subset Scanning” methods from the anomalous pattern detection domain to the task of detecting anomalous inputs to neural networks. Subset Scanning allows us to answer the question: “Which subset of inputs have largerthan-expected activations at which subset of nodes?” Framing the adversarial detection problem this way allows us to identify systematic patterns in the activation space that span multiple adversarially noised images. Such images are “weird together”. Leveraging this common anomalous pattern we show increased detection power as the proportion of noised images increases in a test set. Detection power and accuracy results are provided for targeted adversarial noise added to CIFAR-10 images on a 20-layer ResNet using the Basic Iterative Method attack.

[1]  Jeff W. Lingwall,et al.  A Nonparametric Scan Statistic for Multivariate Disease Surveillance , 2007 .

[2]  Daniel B. Neill,et al.  Fast subset scan for spatial pattern detection , 2012 .

[3]  Ananthram Swami,et al.  Distillation as a Defense to Adversarial Perturbations Against Deep Neural Networks , 2015, 2016 IEEE Symposium on Security and Privacy (SP).

[4]  Samy Bengio,et al.  Adversarial Machine Learning at Scale , 2016, ICLR.

[5]  David Wagner,et al.  Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods , 2017, AISec@CCS.

[6]  Patrick D. McDaniel,et al.  On the Effectiveness of Defensive Distillation , 2016, ArXiv.

[7]  Pedro M. Domingos,et al.  Adversarial classification , 2004, KDD.

[8]  Ryan R. Curtin,et al.  Detecting Adversarial Samples from Artifacts , 2017, ArXiv.

[9]  Sriram Somanchi,et al.  Efficient Discovery of Heterogeneous Treatment Effects in Randomized Experiments via Anomalous Pattern Detection , 2018 .

[10]  Douglas H. Jones,et al.  Goodness-of-fit test statistics that dominate the Kolmogorov statistics , 1979 .

[11]  D. Donoho,et al.  Higher criticism for detecting sparse heterogeneous mixtures , 2004, math/0410072.

[12]  Patrick D. McDaniel,et al.  On the (Statistical) Detection of Adversarial Examples , 2017, ArXiv.

[13]  Joan Bruna,et al.  Intriguing properties of neural networks , 2013, ICLR.

[14]  Daniel B. Neill,et al.  Non-parametric scan statistics for event detection and forecasting in heterogeneous social media graphs , 2014, KDD.

[15]  Samy Bengio,et al.  Adversarial examples in the physical world , 2016, ICLR.

[16]  Dale Schuurmans,et al.  Learning with a Strong Adversary , 2015, ArXiv.

[17]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Patrick D. McDaniel,et al.  Cleverhans V0.1: an Adversarial Machine Learning Library , 2016, ArXiv.

[19]  Jonathon Shlens,et al.  Explaining and Harnessing Adversarial Examples , 2014, ICLR.

[20]  Zhitao Gong,et al.  Adversarial and Clean Data Are Not Twins , 2017, aiDM@SIGMOD.

[21]  Jan Hendrik Metzen,et al.  On Detecting Adversarial Perturbations , 2017, ICLR.

[22]  Daniel B. Neill,et al.  Fast generalized subset scan for anomalous pattern detection , 2013, J. Mach. Learn. Res..