Bayesian Aggregation of Binary Classifiers

Multiclass classification problems are often decomposed into multiple binary problems that are solved by individual binary classifiers whose results are integrated into a final answer. Various methods have been developed to aggregate binary classifiers, including voting heuristics, loss-based decoding, and probabilistic decoding methods, but a little work on the optimal aggregation has been done. In this paper we present a Bayesian method for optimally aggregating binary classifiers where class membership probabilities are determined by predictive probabilities. We model the class membership probability as a softmax function whose input argument is a linear combination of discrepancies between code words and probability estimates obtained by the binary classifiers. We consider a lower bound on the softmax function, which is represented as a product of logistic sigmoids, and we formulate the problem of learning aggregation weights as a variational logistic regression. Predictive probabilities computed by variational logistic regression yield the class membership probabilities. We stress two notable advantages over existing methods in the viewpoint of complexity and over fitting. Numerical experiments on several datasets confirm its useful behavior.

[1]  Chih-Jen Lin,et al.  Probability Estimates for Multi-class Classification by Pairwise Coupling , 2003, J. Mach. Learn. Res..

[2]  Shin Ishii,et al.  Optimal Aggregation of Binary Classifiers for Multiclass Cancer Diagnosis Using Gene Expression Profiles , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[3]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[4]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[5]  J. Downing,et al.  Classification, subtype discovery, and prediction of outcome in pediatric acute lymphoblastic leukemia by gene expression profiling. , 2002, Cancer cell.

[6]  R. A. Bradley,et al.  RANK ANALYSIS OF INCOMPLETE BLOCK DESIGNS THE METHOD OF PAIRED COMPARISONS , 1952 .

[7]  Robert Tibshirani,et al.  Classification by Pairwise Coupling , 1997, NIPS.

[8]  Stephen P. Boyd,et al.  Convex Optimization , 2004, Algorithms and Theory of Computation Handbook.

[9]  Chih-Jen Lin,et al.  Generalized Bradley-Terry Models and Multi-Class Probability Estimates , 2006, J. Mach. Learn. Res..

[10]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[11]  Alex Pentland,et al.  On Reversing Jensen's Inequality , 2000, NIPS.

[12]  Michael I. Jordan,et al.  Bayesian parameter estimation via variational methods , 2000, Stat. Comput..

[13]  T. Poggio,et al.  Multiclass cancer diagnosis using tumor gene expression signatures , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[14]  Mark Girolami,et al.  Variational Bayesian Multinomial Probit Regression with Gaussian Process Priors , 2006, Neural Computation.

[15]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[16]  D. Böhning Multinomial logistic regression algorithm , 1992 .

[17]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[18]  B. Zadrozny Reducing multiclass to binary by coupling probability estimates , 2001, NIPS.