Bayesian Classifier Combination

Bayesian model averaging linearly mixes the probabilistic predictions of multiple models, each weighted by its posterior probability. This is the coherent Bayesian way of combining multiple models only under certain restrictive assumptions, which we outline. We explore a general framework for Bayesian model combination (which differs from model averaging) in the context of classification. This framework explicitly models the relationship between each model’s output and the unknown true label. The framework does not require that the models be probabilistic (they can even be human assessors), that they share prior information or receive the same training data, or that they be independent in their errors. Finally, the Bayesian combiner does not need to believe any of the models is in fact correct. We test several variants of this classifier combination procedure starting from a classic statistical model proposed by Dawid and Skene (1979) and using MCMC to add more complex but important features to the model. Comparisons on several data sets to simpler methods like majority voting show that the Bayesian methods not only perform well but result in interpretable diagnostics on the data points and the models.

[1]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[2]  Christian Genest,et al.  Combining Probability Distributions: A Critique and an Annotated Bibliography , 1986 .

[3]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[4]  Adam Krzyżak,et al.  Methods of combining multiple classifiers and their applications to handwriting recognition , 1992, IEEE Trans. Syst. Man Cybern..

[5]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[6]  Robert A. Jacobs,et al.  Methods For Combining Experts' Probability Assessments , 1995, Neural Computation.

[7]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[8]  Ian H. Witten,et al.  Stacking Bagged and Dagged Models , 1997, ICML.

[9]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[10]  Thomas G. Dietterich Ensemble Methods in Machine Learning , 2000, Multiple Classifier Systems.

[11]  Thomas P. Minka,et al.  Bayesian model averaging is not model combination , 2002 .

[12]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[13]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[14]  Zoubin Ghahramani,et al.  Bayesian Learning in Undirected Graphical Models: Approximate MCMC Algorithms , 2004, UAI.

[15]  Yuan Qi,et al.  Extending expectation propagation for graphical models , 2005 .

[16]  Max Welling,et al.  Learning in Markov Random Fields with Contrastive Free Energies , 2005, AISTATS.

[17]  Max Welling,et al.  Bayesian Random Fields: The Bethe-Laplace Approximation , 2006, UAI.

[18]  Geoffrey E. Hinton,et al.  Restricted Boltzmann machines for collaborative filtering , 2007, ICML '07.

[19]  G. Takács,et al.  On the Gravity Recommendation System , 2007 .

[20]  Yehuda Koren,et al.  The BellKor solution to the Netflix Prize , 2007 .

[21]  Venu Govindaraju,et al.  Review of Classifier Combination Methods , 2008, Machine Learning in Document Analysis and Recognition.

[22]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[23]  Serge Abiteboul,et al.  Corroborating information from disagreeing views , 2010, WSDM '10.

[24]  Gjergji Kasneci,et al.  CoBayes: bayesian knowledge corroboration with assessors of unknown areas of expertise , 2011, WSDM '11.