Non-crossing large-margin probability estimation and its application to robust SVM via preconditioning.

Many large-margin classifiers such as the Support Vector Machine (SVM) sidestep estimating conditional class probabilities and target the discovery of classification boundaries directly. However, estimation of conditional class probabilities can be useful in many applications. Wang, Shen, and Liu (2008) bridged the gap by providing an interval estimator of the conditional class probability via bracketing. The interval estimator was achieved by applying different weights to positive and negative classes and training the corresponding weighted large-margin classifiers. They propose to estimate the weighted large-margin classifiers individually. However, empirically the individually estimated classification boundaries may suffer from crossing each other even though, theoretically, they should not.In this work, we propose a technique to ensure non-crossing of the estimated classification boundaries. Furthermore, we take advantage of the estimated conditional class probabilities to precondition our training data. The standard SVM is then applied to the preconditioned training data to achieve robustness. Simulations and real data are used to illustrate their finite sample performance.

[1]  Hao Helen Zhang,et al.  Variable selection for the multicategory SVM via adaptive sup-norm regularization , 2008, 0803.3676.

[2]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[3]  J. Friedman Special Invited Paper-Additive logistic regression: A statistical view of boosting , 2000 .

[4]  W. Wong,et al.  On ψ-Learning , 2003 .

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Xiwu Lin,et al.  Smoothing spline ANOVA models for large data sets with Bernoulli observations and the randomized GACV , 2000 .

[7]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[8]  R. Tibshirani,et al.  "Preconditioning" for feature selection and regression in high-dimensional problems , 2007, math/0703858.

[9]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[10]  Robert Tibshirani,et al.  1-norm Support Vector Machines , 2003, NIPS.

[11]  Yufeng Liu,et al.  Multicategory ψ-Learning , 2006 .

[12]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[13]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.

[14]  Yi Lin,et al.  Support Vector Machines and the Bayes Rule in Classification , 2002, Data Mining and Knowledge Discovery.

[15]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[16]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[17]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[18]  V. Vapnik Pattern recognition using generalized portrait method , 1963 .

[19]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[20]  Yufeng Liu,et al.  Probability estimation for large-margin classifiers , 2008 .

[21]  Yufeng Liu,et al.  Robust Truncated Hinge Loss Support Vector Machines , 2007 .

[22]  Jianqing Fan,et al.  Variable Selection via Nonconcave Penalized Likelihood and its Oracle Properties , 2001 .