Out-of-Distribution Detection with Distance Guarantee in Deep Generative Models

It is challenging to detect anomaly (or out-of-distribution (OOD) data) in deep generative models (DGM) including flow-based models and variational autoencoders (VAEs). In this paper, we prove that, for a well-trained flow-based model, the distance between the distribution of representations of an OOD dataset and prior can be large enough, as long as the distance between the distributions of the training dataset and the OOD dataset is large enough. Since the most commonly used prior in flow-based model is factorized, the distribution of representations of an OOD dataset tends to be non-factorized when far from the prior. Furthermore, we observe that the distribution of the representations of OOD datasets in flow model is also Gaussian-like. Based on our theorem and the key observation, we propose an easy-to-perform method both for group and point-wise anomaly detection via estimating the total correlation of representations in DGM. We have conducted extensive experiments on prevalent benchmarks to evaluate our method. For group anomaly detection (GAD), our method can achieve near 100% AUROC on all problems and has robustness against data manipulation. On the contrary, the state-of-the-art (SOTA) GAD method performs not better than random guessing for challenging problems and can be attacked by data manipulation in almost all cases. For point-wise anomaly detection (PAD), our method is comparable to SOTA PAD method on one category of problems and achieves near 100% AUROC on another category of problems where the SOTA PAD method fails.

[1]  Yee Whye Teh,et al.  Do Deep Generative Models Know What They Don't Know? , 2018, ICLR.

[2]  Christopher M. Bishop,et al.  Novelty detection and neural network validation , 1994 .

[3]  Andrew Gordon Wilson,et al.  Semi-Supervised Learning with Normalizing Flows , 2019, ICML.

[4]  Alexandre B. Tsybakov,et al.  Introduction to Nonparametric Estimation , 2008, Springer series in statistics.

[5]  Iain Murray,et al.  Masked Autoregressive Flow for Density Estimation , 2017, NIPS.

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  David Hinkley,et al.  Bootstrap Methods: Another Look at the Jackknife , 2008 .

[8]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  L. Brouwer Beweis der Invarianz desn-dimensionalen Gebiets , 1911 .

[10]  Smita Prava Mishra,et al.  Analysis of Techniques for Credit Card Fraud Detection: A Data Mining Perspective , 2020 .

[11]  Roger B. Grosse,et al.  Isolating Sources of Disentanglement in Variational Autoencoders , 2018, NeurIPS.

[12]  Martin J. Wainwright,et al.  Estimating Divergence Functionals and the Likelihood Ratio by Convex Risk Minimization , 2008, IEEE Transactions on Information Theory.

[13]  James J. Little,et al.  Does Your Model Know the Digit 6 Is Not a Cat? A Less Biased Evaluation of "Outlier" Detectors , 2018, ArXiv.

[14]  Prafulla Dhariwal,et al.  Glow: Generative Flow with Invertible 1x1 Convolutions , 2018, NeurIPS.

[15]  Samy Bengio,et al.  Density estimation using Real NVP , 2016, ICLR.

[16]  Jordi Luque,et al.  Input complexity and out-of-distribution detection with likelihood-based generative models , 2020, ICLR.

[17]  Yishu Miao Deep generative models for natural language processing , 2017 .

[18]  Kibok Lee,et al.  A Simple Unified Framework for Detecting Out-of-Distribution Samples and Adversarial Attacks , 2018, NeurIPS.

[19]  Roman Vershynin,et al.  High-Dimensional Probability , 2018 .

[20]  Bernhard Schölkopf,et al.  One-Class Support Measure Machines for Group Anomaly Detection , 2013, UAI.

[21]  Kevin Gimpel,et al.  A Baseline for Detecting Misclassified and Out-of-Distribution Examples in Neural Networks , 2016, ICLR.

[22]  Andriy Mnih,et al.  Disentangling by Factorising , 2018, ICML.

[23]  Christopher Burgess,et al.  beta-VAE: Learning Basic Visual Concepts with a Constrained Variational Framework , 2016, ICLR 2016.

[24]  L. Pardo Statistical Inference Based on Divergence Measures , 2005 .

[25]  Fredric C. Gey,et al.  The Relationship between Recall and Precision , 1994, J. Am. Soc. Inf. Sci..

[26]  Christopher K. I. Williams,et al.  A Framework for the Quantitative Evaluation of Disentangled Representations , 2018, ICLR.

[27]  Jasper Snoek,et al.  Likelihood Ratios for Out-of-Distribution Detection , 2019, NeurIPS.

[28]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[29]  S. Sitharama Iyengar,et al.  A Survey on Malware Detection Using Data Mining Techniques , 2017, ACM Comput. Surv..

[30]  Qing Wang,et al.  Divergence Estimation for Multidimensional Densities Via $k$-Nearest-Neighbor Distances , 2009, IEEE Transactions on Information Theory.

[31]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[32]  Barnabás Póczos,et al.  Group Anomaly Detection using Flexible Genre Models , 2011, NIPS.

[33]  Roland Vollgraf,et al.  Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms , 2017, ArXiv.

[34]  Sungzoon Cho,et al.  Variational Autoencoder based Anomaly Detection using Reconstruction Probability , 2015 .

[35]  Laura Sacerdote,et al.  Non-Parametric Estimation of Mutual Information through the Entropy of the Linkage , 2013, Entropy.

[36]  Abhishek Kumar,et al.  Variational Inference of Disentangled Latent Concepts from Unlabeled Observations , 2017, ICLR.

[37]  David Duvenaud,et al.  Invertible Residual Networks , 2018, ICML.

[38]  Pieter Abbeel,et al.  Flow++: Improving Flow-Based Generative Models with Variational Dequantization and Architecture Design , 2019, ICML.

[39]  Lucas C. Parra,et al.  Statistical Independence and Novelty Detection with Information Preserving Nonlinear Maps , 1996, Neural Computation.

[40]  VARUN CHANDOLA,et al.  Anomaly detection: A survey , 2009, CSUR.

[41]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[42]  S. M. Ali,et al.  A General Class of Coefficients of Divergence of One Distribution from Another , 1966 .

[43]  Leandro Pardo,et al.  Asymptotic behaviour and statistical applications of divergence measures in multinomial populations: a unified study , 1995 .

[44]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[45]  Qing Wang,et al.  Divergence estimation of continuous distributions based on data-dependent partitions , 2005, IEEE Transactions on Information Theory.

[46]  Václav Smídl,et al.  Are generative deep models for novelty detection truly better? , 2018, ArXiv.

[47]  J. D. Gorman,et al.  Alpha-Divergence for Classification, Indexing and Retrieval (Revised 2) , 2002 .

[48]  Michael Brady,et al.  Novelty detection for the identification of masses in mammograms , 1995 .

[49]  Bernhard Schölkopf,et al.  Challenging Common Assumptions in the Unsupervised Learning of Disentangled Representations , 2018, ICML.

[50]  Thomas G. Dietterich,et al.  Deep Anomaly Detection with Outlier Exposure , 2018, ICLR.

[51]  Sanjay Chawla,et al.  Group Anomaly Detection using Deep Generative Models , 2018, ECML/PKDD.

[52]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[53]  Jeff Donahue,et al.  Large Scale GAN Training for High Fidelity Natural Image Synthesis , 2018, ICLR.

[54]  S. Canu,et al.  Support Measure Data Description for group anomaly detection , 2015, KDD 2015.

[55]  Barnabás Póczos,et al.  Hierarchical Probabilistic Models for Group Anomaly Detection , 2011, AISTATS.

[56]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[57]  Alfred O. Hero,et al.  Ensemble estimation of multivariate f-divergence , 2014, 2014 IEEE International Symposium on Information Theory.

[58]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[59]  Andrew Y. Ng,et al.  Reading Digits in Natural Images with Unsupervised Feature Learning , 2011 .

[60]  Carlos Riquelme,et al.  Practical and Consistent Estimation of f-Divergences , 2019, NeurIPS.

[61]  Jon Sneyers,et al.  FLIF: Free lossless image format based on MANIAC compression , 2016, 2016 IEEE International Conference on Image Processing (ICIP).

[62]  Martin J. Wainwright,et al.  Estimating divergence functionals and the likelihood ratio by penalized convex risk minimization , 2007, NIPS.

[63]  Eric T. Nalisnick,et al.  Detecting Out-of-Distribution Inputs to Deep Generative Models Using Typicality , 2019 .

[64]  Edward Choi Doctor AI: Interpretable deep learning for modeling electronic health records , 2018 .

[65]  E. Giné,et al.  On the Bootstrap of $U$ and $V$ Statistics , 1992 .

[66]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[67]  Xiaogang Wang,et al.  Deep Learning Face Attributes in the Wild , 2014, 2015 IEEE International Conference on Computer Vision (ICCV).

[68]  David A. Clifton,et al.  Extending the Generalised Pareto Distribution for Novelty Detection in High-Dimensional Spaces , 2013, J. Signal Process. Syst..

[69]  Alexander A. Alemi,et al.  WAIC, but Why? Generative Ensembles for Robust Anomaly Detection , 2018 .

[70]  Xi Chen,et al.  PixelCNN++: Improving the PixelCNN with Discretized Logistic Mixture Likelihood and Other Modifications , 2017, ICLR.

[71]  Sanjay Chawla,et al.  Group Deviation Detection Methods , 2018, ACM Comput. Surv..

[72]  Guillaume Desjardins,et al.  Understanding disentangling in β-VAE , 2018, ArXiv.

[73]  Yoshua Bengio,et al.  Generative Adversarial Nets , 2014, NIPS.

[74]  David Pfau,et al.  Towards a Definition of Disentangled Representations , 2018, ArXiv.

[75]  Nhien-An Le-Khac,et al.  Finding Rats in Cats: Detecting Stealthy Attacks using Group Anomaly Detection , 2019, 2019 18th IEEE International Conference On Trust, Security And Privacy In Computing And Communications/13th IEEE International Conference On Big Data Science And Engineering (TrustCom/BigDataSE).

[76]  Michael Satosi Watanabe,et al.  Information Theoretical Analysis of Multivariate Correlation , 1960, IBM J. Res. Dev..

[77]  Anders Høst-Madsen,et al.  Data Discovery and Anomaly Detection Using Atypicality for Real-Valued Data , 2019, Entropy.