论文信息 - A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

A Lazy Man's Approach to Benchmarking: Semisupervised Classifier Evaluation and Recalibration

How many labeled examples are needed to estimate a classifier's performance on a new dataset? We study the case where data is plentiful, but labels are expensive. We show that by making a few reasonable assumptions on the structure of the data, it is possible to estimate performance curves, with confidence bounds, using a small number of ground truth labels. Our approach, which we call Semi supervised Performance Evaluation (SPE), is based on a generative model for the classifier's confidence scores. In addition to estimating the performance of classifiers on new datasets, SPE can be used to recalibrate a classifier by re-estimating the class-conditional confidence distributions.

[1] Paul N. Bennett. Using Asymmetric Distributions to Improve Classifier Probabilities : A Comparison of New and Standard Parametric Methods , 2002 .

[2] S. Ghosal,et al. Bayesian bootstrap estimation of ROC curve , 2008, Statistics in medicine.

[3] Radford M. Neal. MCMC Using Hamiltonian Dynamics , 2011, 1206.1901.

[4] Paul N. Bennett,et al. Online stratified sampling: evaluating classifiers at web-scale , 2010, CIKM.

[5] Sebastian Thrun,et al. Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[6] Alexei A. Efros,et al. Undoing the Damage of Dataset Bias , 2012, ECCV.

[7] S. MacEachern. Estimating normal means with a conjugate style dirichlet process prior , 1994 .

[8] Alaattin Erkanli,et al. Bayesian semi‐parametric ROC analysis , 2006, Statistics in medicine.

[9] John Langford,et al. Importance weighted active learning , 2008, ICML '09.

[10] K R Abrams,et al. Bayesian Approaches to Meta-analysi of ROC Curves , 1999, Medical decision making : an international journal of the Society for Medical Decision Making.

[11] Steffen Bickel,et al. Active Risk Estimation , 2010, ICML.

[12] Bill Triggs,et al. Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[13] Pietro Perona,et al. Pedestrian Detection: An Evaluation of the State of the Art , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[14] Andrew McCallum,et al. Toward interactive training and evaluation , 2011, CIKM '11.

[15] Tobias Scheffer,et al. Active Estimation of F-Measures , 2010, NIPS.

[16] Xiaojin Zhu,et al. --1 CONTENTS , 2006 .

[17] John Platt,et al. Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[18] Matthias Seeger,et al. Learning from Labeled and Unlabeled Data , 2010, Encyclopedia of Machine Learning.

[19] Sanjoy Dasgupta,et al. Hierarchical sampling for active learning , 2008, ICML '08.

[20] Jorge Nocedal,et al. A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..