Multi-task Multi-modal Models for Collective Anomaly Detection

This paper proposes a new framework for anomaly detection when collectively monitoring many complex systems. The prerequisite for condition-based monitoring in industrial applications is the capability of (1) capturing multiple operational states, (2) managing many similar but different assets, and (3) providing insights into the internal relationship of the variables. To meet these criteria, we propose a multi-task learning approach based on a sparse mixture of sparse Gaussian graphical models (GGMs). Unlike existing fused- and group-lasso-based approaches, each task is represented by a sparse mixture of sparse GGMs, and can handle multi-modalities. We develop a variational inference algorithm combined with a novel sparse mixture weight selection algorithm. To handle issues in the conventional automatic relevance determination (ARD) approach, we propose a new ℓ0-regularized formulation that has guaranteed sparsity in mixture weights. We show that our framework eliminates well-known issues of numerical instability in the iterative procedure of mixture model learning. We also show better performance in anomaly detection tasks on real-world data sets. To the best of our knowledge, this is the first proposal of multi-task GGM learning allowing multi-modal distributions.

[1]  Adrian Corduneanu,et al.  Variational Bayesian Model Selection for Mixture Distributions , 2001 .

[2]  H. Goldstein Multilevel Modelling of Survey Data , 1991 .

[3]  Rich Caruana,et al.  Multitask Learning , 1997, Machine-mediated learning.

[4]  Christophe Ambroise,et al.  Inferring multiple graphical structures , 2009, Stat. Comput..

[5]  Yan Liu,et al.  Learning with Minimum Supervision: A General Framework for Transductive Transfer Learning , 2011, 2011 IEEE 11th International Conference on Data Mining.

[6]  Takashi Washio,et al.  Learning a common substructure of multiple graphical Gaussian models , 2012, Neural Networks.

[7]  David J. C. MacKay,et al.  Information Theory, Inference, and Learning Algorithms , 2004, IEEE Transactions on Information Theory.

[8]  Pascale Fung,et al.  Sparse Inverse Covariance Matrices for Low Resource Speech Recognition , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Dimitris Samaras,et al.  Multi-Task Learning of Gaussian Graphical Models , 2010, ICML.

[10]  Christine B Peterson,et al.  Bayesian Inference of Multiple Gaussian Graphical Models , 2015, Journal of the American Statistical Association.

[11]  Jean-Baptiste Poline,et al.  Brain covariance selection: better individual functional connectivity models using population prior , 2010, NIPS.

[12]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[13]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[14]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[15]  Naoki Abe,et al.  Proximity-Based Anomaly Detection Using Sparse Structure Learning , 2009, SDM.

[16]  Alexandre d'Aspremont,et al.  Convex optimization techniques for fitting sparse Gaussian graphical models , 2006, ICML.

[17]  Eduardo Freire Nakamura,et al.  An incremental technique for real-time bioacoustic signal segmentation , 2015, Expert Syst. Appl..

[18]  José Ragot,et al.  Multi-task learning with one-class SVM , 2014, Neurocomputing.

[19]  Philip S. Yu,et al.  A robust one-class transfer learning method with uncertain data , 2014, Knowledge and Information Systems.

[20]  Robert P. W. Duin,et al.  Support Vector Data Description , 2004, Machine Learning.

[21]  Graham J. Williams,et al.  On-Line Unsupervised Outlier Detection Using Finite Mixtures with Discounting Learning Algorithms , 2000, KDD '00.

[22]  Xiaotong Shen,et al.  Estimation of multiple networks in Gaussian mixture models. , 2016, Electronic journal of statistics.