Convex Optimization for Binary Classifier Aggregation in Multiclass Problems

Multiclass problems are often decomposed into multiple binary problems that are solved by individual binary classifiers whose results are integrated into a final answer. Various methods, including all-pairs (APs), one-versus-all (OVA), and error correcting output code (ECOC), have been studied, to decompose multiclass problems into binary problems. However, little study has been made to optimally aggregate binary problems to determine a final answer to the multiclass problem. In this paper we present a convex optimization method for an optimal aggregation of binary classifiers to estimate class membership probabilities in multiclass problems. We model the class membership probability as a softmax function which takes a conic combination of discrepancies induced by individual binary classifiers, as an input. With this model, we formulate the regularized maximum likelihood estimation as a convex optimization problem, which is solved by the primal-dual interior point method. Connections of our method to large margin classifiers are presented, showing that the large margin formulation can be considered as a limiting case of our convex formulation. Numerical experiments on synthetic and real-world data sets demonstrate that our method outperforms existing aggregation methods as well as direct methods, in terms of the classification accuracy and the quality of class membership probability estimates.

[1]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[2]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A review on the combination of binary classifiers in multiclass problems , 2008, Artificial Intelligence Review.

[3]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[4]  Hsuan-Tien Lin,et al.  A note on Platt’s probabilistic outputs for support vector machines , 2007, Machine Learning.

[5]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[6]  J. William Ahwood,et al.  CLASSIFICATION , 1931, Foundations of Familiar Language.

[7]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[8]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[9]  Shin Ishii,et al.  A Multiclass Classification Method Based on Decoding of Binary Classifiers , 2009, Neural Computation.

[10]  Dragos D. Margineantu,et al.  Class Probability Estimation and Cost-Sensitive Classification Decisions , 2002, ECML.

[11]  B. Zadrozny Reducing multiclass to binary by coupling probability estimates , 2001, NIPS.

[12]  Stephen P. Boyd,et al.  Subgradient Methods , 2007 .

[13]  Mark W. Schmidt,et al.  An interior-point stochastic approximation method and an L1-regularized delta rule , 2008, NIPS.

[14]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[15]  Chih-Jen Lin,et al.  Generalized Bradley-Terry Models and Multi-Class Probability Estimates , 2006, J. Mach. Learn. Res..

[16]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[17]  D K Smith,et al.  Numerical Optimization , 2001, J. Oper. Res. Soc..

[18]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[19]  M. J. D. Powell,et al.  Nonlinear Programming—Sequential Unconstrained Minimization Techniques , 1969 .

[20]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[21]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Gérard Dreyfus,et al.  Handwritten digit recognition by neural networks with single-layer training , 1992, IEEE Trans. Neural Networks.

[23]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[24]  Tong Zhang,et al.  Text Categorization Based on Regularized Linear Classification Methods , 2001, Information Retrieval.

[25]  Yiming Yang,et al.  Modified Logistic Regression: An Approximation to SVM and Its Applications in Large-Scale Text Categorization , 2003, ICML.

[26]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[27]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[28]  Jason Weston,et al.  Large Scale Transductive SVMs , 2006, J. Mach. Learn. Res..

[29]  Peter L. Bartlett,et al.  Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , 2003, J. Mach. Learn. Res..

[30]  Chih-Jen Lin,et al.  A comparison of methods for multiclass support vector machines , 2002, IEEE Trans. Neural Networks.

[31]  G. Brier VERIFICATION OF FORECASTS EXPRESSED IN TERMS OF PROBABILITY , 1950 .

[32]  Narendra Karmarkar,et al.  A new polynomial-time algorithm for linear programming , 1984, Comb..

[33]  Stephen P. Boyd,et al.  A tutorial on geometric programming , 2007, Optimization and Engineering.

[34]  K. Schittkowski,et al.  NONLINEAR PROGRAMMING , 2022 .

[35]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[36]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[37]  Shin Ishii,et al.  Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[38]  Eric R. Ziegel,et al.  Generalized Linear Models , 2002, Technometrics.

[39]  Stephen P. Boyd,et al.  An Interior-Point Method for Large-Scale l1-Regularized Logistic Regression , 2007, J. Mach. Learn. Res..

[40]  Sunho Park,et al.  Bayesian Aggregation of Binary Classifiers , 2010, 2010 IEEE International Conference on Data Mining.

[41]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.