Momresp: A Bayesian Model for Multi-Annotator Document Labeling

Data annotation in modern practice often involves multiple, imperfect human annotators. Multiple annotations can be used to infer estimates of the ground-truth labels and to estimate individual annotator error characteristics (or reliability). We introduce MomResp, a model that incorporates information from both natural data clusters as well as annotations from multiple annotators to infer ground-truth labels and annotator reliability for the document classification task. We implement this model and show dramatic improvements over majority vote in situations where both annotations are scarce and annotation quality is low as well as in situations where annotators disagree consistently. Because MomResp predictions are subject to label switching, we introduce a solution that finds nearly optimal predicted class reassignments in a variety of settings using only information available to the model at inference time. Although MomResp does not perform well in annotation-rich situations, we show evidence suggesting how this shortcoming may be overcome in future work.

[1]  Alexander Zien,et al.  Semi-Supervised Text Classification Using EM , 2006 .

[2]  Jiawei Han,et al.  Mining Heterogeneous Information Networks by Exploring the Power of Links , 2009, ALT.

[3]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[4]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[5]  M. Stephens Dealing with label switching in mixture models , 2000 .

[6]  Peng Dai,et al.  Human Intelligence Needs Artificial Intelligence , 2011, Human Computation.

[7]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[8]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[9]  Mausam,et al.  Dynamically Switching between Synergistic Workflows for Crowdsourcing , 2012, AAAI.

[10]  Dan Roth,et al.  Latent credibility analysis , 2013, WWW.

[11]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[12]  Mark Davies,et al.  Pointing Out Frequent Phrasal Verbs: A Corpus‐Based Analysis , 2007 .

[13]  Inc. Alias-i Multilevel Bayesian Models of Categorical Data Annotation , 2008 .

[14]  Robbie A. Haertel,et al.  Practical Cost-Conscious Active Learning for Data Annotation in Annotator-Initiated Environments , 2013 .

[15]  David G. Stork,et al.  Toward Optimal Labeling Strategy under Multiple Unreliable Labelers , 2005, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[16]  David Jurgens,et al.  Embracing Ambiguity: A Comparison of Annotation Methodologies for Crowdsourcing Word Sense Labels , 2013, NAACL.

[17]  Tom M. Mitchell,et al.  Semi-Supervised Text Classification Using EM , 2006, Semi-Supervised Learning.

[18]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[19]  Jun S. Liu,et al.  The Collapsed Gibbs Sampler in Bayesian Computations with Applications to a Gene Regulation Problem , 1994 .

[20]  H. Kucera,et al.  Computational analysis of present-day American English , 1967 .

[21]  Eric K. Ringger,et al.  Bayesian text analytics for document collections , 2012 .

[22]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[23]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[24]  James L. Carroll,et al.  A bayesian decision theoretical approach to supervised learning, selective sampling, and empirical function optimization , 2010 .

[25]  Mauro Dell'Amico,et al.  Assignment Problems , 1998, IFIP Congress: Fundamentals - Foundations of Computer Science.

[26]  Eric K. Ringger,et al.  Modeling the Annotation Process for Ancient Corpus Creation , 2007 .

[27]  Dirk Hovy,et al.  Learning Whom to Trust with MACE , 2013, NAACL.

[28]  Dan Roth,et al.  Knowing What to Believe (when you already know something) , 2010, COLING.