Structural Minimax Probability Machine

Minimax probability machine (MPM) is an interesting discriminative classifier based on generative prior knowledge. It can directly estimate the probabilistic accuracy bound by minimizing the maximum probability of misclassification. The structural information of data is an effective way to represent prior knowledge, and has been found to be vital for designing classifiers in real-world problems. However, MPM only considers the prior probability distribution of each class with a given mean and covariance matrix, which does not efficiently exploit the structural information of data. In this paper, we use two finite mixture models to capture the structural information of the data from binary classification. For each subdistribution in a finite mixture model, only its mean and covariance matrix are assumed to be known. Based on the finite mixture models, we propose a structural MPM (SMPM). SMPM can be solved effectively by a sequence of the second-order cone programming problems. Moreover, we extend a linear model of SMPM to a nonlinear model by exploiting kernelization techniques. We also show that the SMPM can be interpreted as a large margin classifier and can be transformed to support vector machine and maxi–min margin machine under certain special conditions. Experimental results on both synthetic and real-world data sets demonstrate the effectiveness of SMPM.

[1]  Ling Shao,et al.  Efficient Feature Selection and Classification for Vehicle Detection , 2015, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Bin Gu,et al.  Incremental Support Vector Learning for Ordinal Regression , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[3]  Daniel S. Yeung,et al.  Structured large margin machines: sensitive to data distributions , 2007, Machine Learning.

[4]  G. DeJong,et al.  Generative Prior Knowledge for Discriminative Classification , 2006, J. Artif. Intell. Res..

[5]  Bin Gu,et al.  Ordinal-class core vector machine , 2010 .

[6]  Lai-Wan Chan,et al.  The Minimum Error Minimax Probability Machine , 2004, J. Mach. Learn. Res..

[7]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[8]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[9]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[10]  Maya R. Gupta,et al.  Bounds on the Bayes Error Given Moments , 2012, IEEE Transactions on Information Theory.

[11]  Elzbieta Pekalska,et al.  Kernel Discriminant Analysis for Positive Definite and Indefinite Kernels , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[12]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[13]  Isij Monitor,et al.  Network Intrusion Detection: An Analyst’s Handbook , 2000 .

[14]  Michael I. Jordan,et al.  Robust Novelty Detection with Single-Class MPM , 2002, NIPS.

[15]  Maya R. Gupta,et al.  Bayesian Quadratic Discriminant Analysis , 2007, J. Mach. Learn. Res..

[16]  J. H. Ward Hierarchical Grouping to Optimize an Objective Function , 1963 .

[17]  Jos F. Sturm,et al.  A Matlab toolbox for optimization over symmetric cones , 1999 .

[18]  Philip Chan,et al.  Determining the number of clusters/segments in hierarchical clustering/segmentation algorithms , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[19]  Julia Aurélie Lasserre,et al.  Hybrids of generative and discriminative methods for machine learning , 2008 .

[20]  Irina Rish,et al.  An empirical study of the naive Bayes classifier , 2001 .

[21]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[22]  Sargur N. Srihari,et al.  On-Line and Off-Line Handwriting Recognition: A Comprehensive Survey , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[23]  Chiranjib Bhattacharyya,et al.  Second Order Cone Programming Formulations for Feature Selection , 2004, J. Mach. Learn. Res..

[24]  Franck Dufrenois,et al.  A One-Class Kernel Fisher Criterion for Outlier Detection , 2015, IEEE Transactions on Neural Networks and Learning Systems.

[25]  Michael I. Jordan,et al.  A Robust Minimax Approach to Classification , 2003, J. Mach. Learn. Res..

[26]  Qiang Yang,et al.  Structural Regularized Support Vector Machine: A Framework for Structural Large Margin Classifier , 2011, IEEE Transactions on Neural Networks.

[27]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[28]  Gregory Z. Grudic,et al.  A Formulation for Minimax Probability Machine Regression , 2002, NIPS.

[29]  Mayer Aladjem,et al.  Regularized mixture discriminant analysis , 2007, Pattern Recognit. Lett..

[30]  Xingming Sun,et al.  Synthetic Aperture Radar Image Segmentation by Modified Student's t-Mixture Model , 2014, IEEE Transactions on Geoscience and Remote Sensing.

[31]  Theofanis Sapatinas,et al.  Discriminant Analysis and Statistical Pattern Recognition , 2005 .

[32]  Michael R. Lyu,et al.  Learning large margin classifiers locally and globally , 2004, ICML.

[33]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[34]  Juan José del Coz,et al.  Multiclass Support Vector Machines With Example-Dependent Costs Applied to Plankton Biomass Estimation , 2013, IEEE Transactions on Neural Networks and Learning Systems.

[35]  Jacek M. Zurada,et al.  A Class of Single-Class Minimax Probability Machines for Novelty Detection , 2007, IEEE Transactions on Neural Networks.

[36]  R. Tibshirani,et al.  Discriminant Analysis by Gaussian Mixtures , 1996 .