PAC-Bayesian AUC classification and scoring

We develop a scoring and classification procedure based on the PAC-Bayesian approach and the AUC (Area Under Curve) criterion. We focus initially on the class of linear score functions. We derive PAC-Bayesian non-asymptotic bounds for two types of prior for the score parameters: a Gaussian prior, and a spike-and-slab prior; the latter makes it possible to perform feature selection. One important advantage of our approach is that it is amenable to powerful Bayesian computational tools. We derive in particular a Sequential Monte Carlo algorithm, as an efficient method which may be used as a gold standard, and an Expectation-Propagation algorithm, as a much faster but approximate method. We also extend our method to a class of non-linear score functions, essentially leading to a nonparametric procedure, by considering a Gaussian process prior.

[1]  Stéphan Clémençon,et al.  A stochastic SIR model with contact-tracing: large population limits and statistical inference , 2008, Journal of biological dynamics.

[2]  Tom Heskes,et al.  Efficient Bayesian multivariate fMRI analysis using a sparsifying spatio-temporal prior , 2010, NeuroImage.

[3]  Ole Winther,et al.  Gaussian Processes for Classification: Mean-Field Algorithms , 2000, Neural Computation.

[4]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[5]  M. Seeger Expectation Propagation for Exponential Families , 2005 .

[6]  Pierre Alquier PAC-Bayesian bounds for randomized empirical risk minimizers , 2007, 0712.1698.

[7]  E. Mammen,et al.  Smooth Discrimination Analysis , 1999 .

[8]  Michael C. Mozer,et al.  Optimizing Classifier Performance via an Approximation to the Wilcoxon-Mann-Whitney Statistic , 2003, ICML.

[9]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[10]  Ajay Jasra,et al.  On population-based simulation for static inference , 2007, Stat. Comput..

[11]  T. J. Mitchell,et al.  Bayesian Variable Selection in Linear Regression , 1988 .

[12]  Pierre Alquier,et al.  Sparse single-index model , 2011, J. Mach. Learn. Res..

[13]  Daniel Hernández-Lobato,et al.  Generalized spike-and-slab priors for Bayesian group feature selection using expectation propagation , 2013, J. Mach. Learn. Res..

[14]  Van Der Vaart,et al.  Adaptive Bayesian estimation using a Gaussian random field with inverse Gamma bandwidth , 2009, 0908.3556.

[15]  Tom Minka,et al.  Expectation Propagation for approximate Bayesian inference , 2001, UAI.

[16]  John Shawe-Taylor,et al.  A PAC analysis of a Bayesian estimator , 1997, COLT '97.

[17]  David A. McAllester Some PAC-Bayesian Theorems , 1998, COLT' 98.

[18]  Mehryar Mohri,et al.  AUC Optimization vs. Error Rate Minimization , 2003, NIPS.

[19]  G. Lugosi,et al.  Ranking and empirical minimization of U-statistics , 2006, math/0603123.

[20]  E. George,et al.  Journal of the American Statistical Association is currently published by American Statistical Association. , 2007 .

[21]  Yoram Singer,et al.  An Efficient Boosting Algorithm for Combining Preferences by , 2013 .

[22]  P. Moral,et al.  Sequential Monte Carlo samplers , 2002, cond-mat/0212648.

[23]  P. Massart,et al.  Concentration inequalities and model selection , 2007 .

[24]  Guillaume Lecué Méthodes d'agrégation : optimalité et vitesses rapides , 2007 .

[25]  Sylvain Robbiano Upper bounds and aggregation in bipartite ranking , 2013 .

[26]  Veronika Rockova,et al.  EMVS: The EM Approach to Bayesian Variable Selection , 2014 .

[27]  Nasser M. Nasrabadi,et al.  Pattern Recognition and Machine Learning , 2006, Technometrics.