Pattern Classification With Class Probability Output Network

The output of a classifier is usually determined by the value of a discriminant function and a decision is made based on this output which does not necessarily represent the posterior probability for the soft decision of classification. In this context, it is desirable that the output of a classifier be calibrated in such a way to give the meaning of the posterior probability of class membership. This paper presents a new method of postprocessing for the probabilistic scaling of classifier's output. For this purpose, the output of a classifier is analyzed and the distribution of the output is described by the beta distribution parameters. For more accurate approximation of class output distribution, the beta distribution parameters as well as the kernel parameters describing the discriminant function are adjusted in such a way to improve the uniformity of beta cumulative distribution function (CDF) values for the given class output samples. As a result, the classifier with the proposed scaling method referred to as the class probability output network (CPON) can provide accurate posterior probabilities for the soft decision of classification. To show the effectiveness of the proposed method, the simulation for pattern classification using the support vector machine (SVM) classifiers is performed for the University of California at Irvine (UCI) data sets. The simulation results using the SVM classifiers with the proposed CPON demonstrated a statistically meaningful performance improvement over the SVM and SVM-related classifiers, and also other probabilistic scaling methods.

[1]  Le Thi Hoai An,et al.  Solving a Class of Linearly Constrained Indefinite Quadratic Problems by D.C. Algorithms , 1997, J. Glob. Optim..

[2]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[3]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[4]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[5]  Trevor Hastie,et al.  Linear Methods for Classification , 2001 .

[6]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[7]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[8]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[9]  Peter M. Williams,et al.  Bayesian Regularization and Pruning Using a Laplace Prior , 1995, Neural Computation.

[10]  Jesús Cid-Sueiro,et al.  Local estimation of posterior class probabilities to minimize classification errors , 2004, IEEE Transactions on Neural Networks.

[11]  Ned Glick,et al.  Additive estimators for probabilities of correct classification , 1978, Pattern Recognit..

[12]  M. Guignard Generalized Kuhn–Tucker Conditions for Mathematical Programming Problems in a Banach Space , 1969 .

[13]  Ron Kohavi,et al.  The Case against Accuracy Estimation for Comparing Induction Algorithms , 1998, ICML.

[14]  R. Gnanadesikan,et al.  Maximum Likelihood Estimation of the Parameters of the Beta Distribution from Smallest Order Statistics , 1967 .

[15]  Vijay K. Rohatgi,et al.  Nonparametric Statistical Inference , 2011 .

[16]  Volker Roth,et al.  Probabilistic Discriminative Kernel Classifiers for Multi-class Problems , 2001, DAGM-Symposium.

[17]  W. Wong,et al.  On ψ-Learning , 2003 .

[18]  G. Wahba Multivariate Function and Operator Estimation, Based on Smoothing Splines and Reproducing Kernels , 1992 .

[19]  David Haussler,et al.  Probabilistic kernel regression models , 1999, AISTATS.

[20]  D. Luenberger Optimization by Vector Space Methods , 1968 .

[21]  Yi Lin Multicategory Support Vector Machines, Theory, and Application to the Classification of . . . , 2003 .

[22]  Emilio Carrizosa,et al.  On Covering Methods for D.C. Optimization , 2000, J. Glob. Optim..

[23]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[24]  A. Ben-Tal,et al.  A unified theory of first and second order conditions for extremum problems in topological vector spaces , 1982 .

[25]  G. Wahba Support vector machines, reproducing kernel Hilbert spaces, and randomized GACV , 1999 .

[26]  Keinosuke Fukunaga,et al.  Nonparametric Bayes error estimation using unclassified samples , 1972, IEEE Trans. Inf. Theory.

[27]  E. Parzen On Estimation of a Probability Density Function and Mode , 1962 .

[28]  Yufeng Liu,et al.  Multicategory ψ-Learning and Support Vector Machine: Computational Tools , 2005 .

[29]  Josef Kittler,et al.  An efficient estimator of pattern recognition system error probability , 1981, Pattern Recognit..

[30]  Yoonkyung Lee,et al.  Classification of Multiple Cancer Types by Multicategory Support Vector Machines Using Gene Expression Data , 2003, Bioinform..

[31]  E. Polak Introduction to linear and nonlinear programming , 1973 .

[32]  Ji Zhu,et al.  Kernel Logistic Regression and the Import Vector Machine , 2001, NIPS.