Gini Support Vector Machine: Quadratic Entropy Based Robust Multi-Class Probability Regression

Many classification tasks require estimation of output class probabilities for use as confidence scores or for inference integrated with other models. Probability estimates derived from large margin classifiers such as support vector machines (SVMs) are often unreliable. We extend SVM large margin classification to GiniSVM maximum entropy multi-class probability regression. GiniSVM combines a quadratic (Gini-Simpson) entropy based agnostic model with a kernel based similarity model. A form of Huber loss in the GiniSVM primal formulation elucidates a connection to robust estimation, further corroborated by the impulsive noise filtering property of the reverse water-filling procedure to arrive at normalized classification margins. The GiniSVM normalized classification margins directly provide estimates of class conditional probabilities, approximating kernel logistic regression (KLR) but at reduced computational cost. As with other SVMs, GiniSVM produces a sparse kernel expansion and is trained by solving a quadratic program under linear constraints. GiniSVM training is efficiently implemented by sequential minimum optimization or by growth transformation on probability functions. Results on synthetic and benchmark data, including speaker verification and face detection data, show improved classification performance and increased tolerance to imprecision over soft-margin SVM and KLR.

[1]  Gert Cauwenberghs,et al.  Sub-Microwatt Analog VLSI Support Vector Machine for Pattern Classification and Sequence Estimation , 2004, NIPS.

[2]  George Eastman House,et al.  Sparse Bayesian Learning and the Relevan e Ve tor Ma hine , 2001 .

[3]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[4]  S. Sathiya Keerthi,et al.  Improvements to Platt's SMO Algorithm for SVM Classifier Design , 2001, Neural Computation.

[5]  L. Baum,et al.  Growth transformations for functions on manifolds. , 1968 .

[6]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[7]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[8]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[9]  Massimiliano Pontil,et al.  Support Vector Machines for 3D Object Recognition , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  Alex Pentland,et al.  Discriminative, generative and imitative learning , 2002 .

[12]  Frederick R. Forst,et al.  On robust estimation of the location parameter , 1980 .

[13]  William M. Campbell,et al.  Speaker recognition with polynomial classifiers , 2002, IEEE Trans. Speech Audio Process..

[14]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[15]  E. Jaynes Information Theory and Statistical Mechanics , 1957 .

[16]  Biing-Hwang Juang,et al.  Fundamentals of speech recognition , 1993, Prentice Hall signal processing series.

[17]  James T. Kwok Moderating the outputs of support vector machine classifiers , 1999, IEEE Trans. Neural Networks.

[18]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[19]  Takeo Kanade,et al.  Neural Network-Based Face Detection , 1998, IEEE Trans. Pattern Anal. Mach. Intell..

[20]  Gert Cauwenberghs,et al.  MARGIN PROPAGATION AND FORWARD DECODING IN ANALOG VLSI , 2003 .

[21]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[22]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[23]  Neil D. Lawrence,et al.  Fast Sparse Gaussian Process Methods: The Informative Vector Machine , 2002, NIPS.

[24]  Robert A. Jacobs,et al.  Hierarchical Mixtures of Experts and the EM Algorithm , 1993, Neural Computation.

[25]  Tomaso A. Poggio,et al.  Pedestrian detection using wavelet templates , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[26]  Gert Cauwenberghs,et al.  Forward Decoding Kernel Machines: A Hybrid HMM/SVM Approach to Sequence Recognition , 2002, SVM.

[27]  Gert Cauwenberghs,et al.  Incremental and Decremental Support Vector Machine Learning , 2000, NIPS.

[28]  Massimiliano Pontil,et al.  Properties of Support Vector Machines , 1998, Neural Computation.

[29]  Mariano Alvira,et al.  MASSACHUSETTS INSTITUTE OF TECHNOLOGY ARTIFICIAL INTELLIGENCE LABORATORY and CENTER FOR BIOLOGICAL AND COMPUTATIONAL LEARNING DEPARTMENT OF BRAIN AND COGNITIVE SCIENCES A.I. Memo No.XXXX C.B.C.L Paper No.XXX An Empirical Comparison of SNoW and SVMs For Face Detection , 2001 .

[30]  Dimitri P. Bertsekas,et al.  Nonlinear Programming , 1997 .

[31]  Yong Gu,et al.  A text-independent speaker verification system using support vector machines classifier , 2001, INTERSPEECH.

[32]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[33]  Ji Zhu,et al.  Support Vector Machines, Kernel Logistic Regression and Boosting , 2002, Multiple Classifier Systems.

[34]  Tomaso A. Poggio,et al.  Regularization Theory and Neural Networks Architectures , 1995, Neural Computation.

[35]  C. R. Rao,et al.  Cross entropy, dissimilarity measures, and characterizations of quadratic entropy , 1985, IEEE Trans. Inf. Theory.

[36]  Roland Auckenthaler,et al.  Score Normalization for Text-Independent Speaker Verification Systems , 2000, Digit. Signal Process..

[37]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[38]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[39]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[40]  Dimitri Kanevsky,et al.  An inequality for rational functions with applications to some statistical estimation problems , 1991, IEEE Trans. Inf. Theory.

[41]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[42]  A. Atiya,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2005, IEEE Transactions on Neural Networks.

[43]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[44]  Herbert Gish,et al.  Speaker identification via support vector classifiers , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[45]  Koby Crammer,et al.  On the Learnability and Design of Output Codes for Multiclass Problems , 2002, Machine Learning.

[46]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.