Estimating Accuracy from Unlabeled Data

We consider the question of how unlabeled data can be used to estimate the true accuracy of learned classifiers. This is an important question for any autonomous learning system that must estimate its accuracy without supervision, and also when classifiers trained from one data distribution must be applied to a new distribution (e.g., document classifiers trained on one text corpus are to be applied to a second corpus). We first show how to estimate error rates exactly from unlabeled data when given a collection of competing classifiers that make independent errors, based on the agreement rates between subsets of these classifiers. We further show that even when the competing classifiers do not make independent errors, both their accuracies and error dependencies can be estimated by making certain relaxed assumptions. Experiments on two data real-world data sets produce estimates within a few percent of the true accuracy, using solely un-labeled data. These results are of practical significance in situations where labeled data is scarce and shed light on the more general question of how the consistency among multiple functions is related to their true accuracies.

[1]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[2]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[3]  C. Antoniak Mixtures of Dirichlet Processes with Applications to Bayesian Nonparametric Problems , 1974 .

[4]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[6]  Yoram Singer,et al.  Unsupervised Models for Named Entity Classification , 1999, EMNLP.

[7]  Radford M. Neal Markov Chain Sampling Methods for Dirichlet Process Mixture Models , 2000 .

[8]  Sanjoy Dasgupta,et al.  PAC Generalization Bounds for Co-training , 2001, NIPS.

[9]  Nicolas Chapados,et al.  Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[10]  David M. Pennock,et al.  Co-Validation: Using Model Disagreement on Unlabeled Data to Validate Classification Algorithms , 2004, NIPS.

[11]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[12]  Alexander Zien,et al.  Metric-Based Approaches for Semi-Supervised Regression and Classification , 2006 .

[13]  Ming-Wei Chang,et al.  Guiding Semi-Supervision with Constraint-Driven Learning , 2007, ACL.

[14]  Krishnakumar Balasubramanian,et al.  Unsupervised Supervised Learning I: Estimating Classification and Regression Errors without Labels , 2010, J. Mach. Learn. Res..

[15]  Lise Getoor,et al.  Probabilistic Similarity Logic , 2010, UAI.

[16]  Estevam R. Hruschka,et al.  Toward an Architecture for Never-Ending Language Learning , 2010, AAAI.

[17]  Maria-Florina Balcan,et al.  Exploiting Ontology Structures and Unlabeled Data for Learning , 2013, ICML.

[18]  Lise Getoor,et al.  Knowledge Graph Identification , 2013, SEMWEB.

[19]  Brian Murphy,et al.  Simultaneously Uncovering the Patterns of Brain Regions Involved in Different Story Reading Subprocesses , 2014, PloS one.

[20]  Yuval Kluger,et al.  Ranking and combining multiple predictors without labeled data , 2013, Proceedings of the National Academy of Sciences.

[21]  Minh Huynh,et al.  Estimation of diagnostic test accuracy without full verification: a review of latent class methods , 2014, Statistics in medicine.

[22]  Matthew Crosby,et al.  Association for the Advancement of Artificial Intelligence , 2014 .

[23]  H. Potter,et al.  GREEK AND ROMAN MYTHOLOGIES ’ CHARACTERIZATIONS AS SIGN ASSOCIATED WITH ROWLING ’ S HARRY POTTER AND THE SORCERER ’ S STONE , 2014 .

[24]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.