Infinite Bayesian one-class support vector machine based on Dirichlet process mixture clustering

Abstract In the problem of one-class classification, one-class classifier (OCC) tries to identify objects of a specific class, called the target class, among all objects, by learning from a training set containing only the objects of target class. However, some traditional OCCs lose their effectiveness when the distribution of target class is a multimodal distribution, and the performance is sensitive to the model parameters. To solve this problem and enhance classification performance, a novel OCC infinite Bayesian OCSVM (IB-OCSVM) is proposed in this study. In IB-OCSVM, we partition the input data into several clusters with the Dirichlet process mixture (DPM), and learn a modified OCSVM in each cluster. Specifically, the clustering procedure and the modified OCSVM are jointly learned in a unified Bayesian frame to guarantee the consistency of clustering and linear separability in each cluster. The parameters can be inferred simply and effectively via Gibbs sampling technique. Meanwhile, we modify the traditional OCSVM by replacing the kernel transformation with a new feature transformation based on the result of clustering via DPM. In the feature space of the modified OCSVM, all transformed samples preserve the same clustering characteristic as the original samples, thus guaranteeing the combination of clustering and classification in our method. Experimental results based on benchmark datasets and synthetic aperture radar real data demonstrate that the proposed method has enhanced classification performance and is more robust to the model parameters than some conventional methods.

[1]  Mário A. T. Figueiredo,et al.  Soft clustering using weighted one-class support vector machines , 2009, Pattern Recognit..

[2]  Martin A. Tanner,et al.  Modelling nonlinear count time series with local mixtures of Poisson autoregressions , 2007, Comput. Stat. Data Anal..

[3]  D. Dunson,et al.  Kernel stick-breaking processes. , 2008, Biometrika.

[4]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[5]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[6]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[7]  Lorenzo Livi,et al.  Modeling and recognition of smart grid faults by a combined approach of dissimilarity learning and one-class classification , 2014, Neurocomputing.

[8]  Michael I. Jordan,et al.  Robust Novelty Detection with Single-Class MPM , 2002, NIPS.

[9]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[10]  Dae-Won Kim,et al.  Density-Induced Support Vector Data Description , 2007, IEEE Transactions on Neural Networks.

[11]  F. Dufrenois,et al.  One class proximal support vector machines , 2016, Pattern Recognit..

[12]  Ammar Belatreche,et al.  An experimental evaluation of novelty detection methods , 2014, Neurocomputing.

[13]  Hongwei Liu,et al.  Infinite max-margin factor analysis via data augmentation , 2016, Pattern Recognit..

[14]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[15]  Yudi Agusta,et al.  Unsupervised learning of Gamma mixture models using Minimum Message Length , 2003 .

[16]  Robert P. W. Duin,et al.  One-Class LP Classifiers for Dissimilarity Representations , 2002, NIPS.

[17]  Joachim Denzler,et al.  Kernel Null Space Methods for Novelty Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[19]  Tianqi Chen,et al.  A Complete Recipe for Stochastic Gradient MCMC , 2015, NIPS.

[20]  Itay Mayrose,et al.  A Gamma mixture model better accounts for among site rate heterogeneity , 2005, ECCB/JBI.

[21]  Robert P. W. Duin,et al.  Uniform Object Generation for Optimizing One-class Classifiers , 2002, J. Mach. Learn. Res..

[22]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[23]  Chang-Dong Wang,et al.  Position regularized Support Vector Domain Description , 2013, Pattern Recognit..

[24]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[25]  Witold Pedrycz,et al.  Anomaly Detection and Characterization in Spatial Time Series Data: A Cluster-Centric Approach , 2014, IEEE Transactions on Fuzzy Systems.

[26]  Alex Rogers,et al.  A multi-agent simulation system for prediction and scheduling of aero engine overhaul , 2008, AAMAS.

[27]  Heiko Hoffmann,et al.  Kernel PCA for novelty detection , 2007, Pattern Recognit..

[28]  Adrian F. M. Smith,et al.  Sampling-Based Approaches to Calculating Marginal Densities , 1990 .

[29]  Robert P. W. Duin,et al.  Minimum spanning tree based one-class classifier , 2009, Neurocomputing.

[30]  Joachim Denzler,et al.  One-class classification with Gaussian processes , 2013, Pattern Recognit..

[31]  Robert P. W. Duin,et al.  Support vector domain description , 1999, Pattern Recognit. Lett..

[32]  Hongwei Liu,et al.  Max-Margin Discriminant Projection via Data Augmentation , 2015, IEEE Transactions on Knowledge and Data Engineering.

[33]  Douglas A. Reynolds,et al.  Gaussian Mixture Models , 2018, Encyclopedia of Biometrics.

[34]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  I. Jolliffe Principal Component Analysis , 2002 .

[36]  Franck Dufrenois,et al.  A One-Class Kernel Fisher Criterion for Outlier Detection , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[37]  Fabrizio Angiulli,et al.  Prototype-Based Domain Description for One-Class Classification , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[38]  David A. Clifton,et al.  A review of novelty detection , 2014, Signal Process..

[39]  Nicholas G. Polson,et al.  Data augmentation for support vector machines , 2011 .

[40]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[41]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[42]  Christopher M. Bishop,et al.  Robust Bayesian Mixture Modelling , 2005, ESANN.

[43]  Vladimir Vapnik,et al.  An overview of statistical learning theory , 1999, IEEE Trans. Neural Networks.

[44]  Peter Sollich,et al.  Bayesian Methods for Support Vector Machines: Evidence and Predictive Class Probabilities , 2002, Machine Learning.

[45]  Panu Somervuo,et al.  Self-organizing maps of symbol strings , 1998, Neurocomputing.

[46]  Bernhard Schölkopf,et al.  SV Estimation of a Distribution's Support , 1999, NIPS 1999.

[47]  Witold Pedrycz,et al.  Entropic One-Class Classifiers , 2014, IEEE Transactions on Neural Networks and Learning Systems.