Learning with Annotation Noise

It is usually assumed that the kind of noise existing in annotated data is random classification noise. Yet there is evidence that differences between annotators are not always random attention slips but could result from different biases towards the classification categories, at least for the harder-to-decide cases. Under an annotation generation model that takes this into account, there is a hazard that some of the training instances are actually hard cases with unreliable annotations. We show that these are relatively unproblematic for an algorithm operating under the 0--1 loss model, whereas for the commonly used voted perceptron algorithm, hard training cases could result in incorrect prediction on the uncontroversial cases at test time.

[1]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[2]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[3]  Michael Collins,et al.  New Ranking Algorithms for Parsing and Tagging: Kernels over Discrete Structures, and the Voted Perceptron , 2002, ACL.

[4]  Vitaly Feldman,et al.  New Results for Learning Noisy Parities and Halfspaces , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[5]  Luis von Ahn Games with a Purpose , 2006, Computer.

[6]  Miles Osborne,et al.  Shallow Parsing using Noisy and Non-Stationary Training Material , 2002, J. Mach. Learn. Res..

[7]  Prasad Raghavendra,et al.  Hardness of Learning Halfspaces with Noise , 2006, 2006 47th Annual IEEE Symposium on Foundations of Computer Science (FOCS'06).

[8]  R. Schapire,et al.  Toward efficient agnostic learning , 1992, COLT '92.

[9]  Xavier Carreras,et al.  Filtering-Ranking Perceptron Learning for Partial Parsing , 2005, Machine Learning.

[10]  David Haussler,et al.  Decision Theoretic Generalizations of the PAC Model for Neural Net and Other Learning Applications , 1992, Inf. Comput..

[11]  Ming Li,et al.  Learning in the presence of malicious errors , 1993, STOC '88.

[12]  William Feller,et al.  An Introduction to Probability Theory and Its Applications , 1967 .

[13]  Beata Beigman Klebanov,et al.  Squibs: From Annotator Agreement to Noise Models , 2009, CL.

[14]  Jean Carletta,et al.  Squibs: Reliability Measurement without Limits , 2008, CL.

[15]  Udo Kruschwitz,et al.  ANAWIKI: Creating Anaphorically Annotated Resources through Web Cooperation , 2008, LREC.

[16]  Frank Rosenblatt,et al.  PRINCIPLES OF NEURODYNAMICS. PERCEPTRONS AND THE THEORY OF BRAIN MECHANISMS , 1963 .

[17]  Michael Collins,et al.  Ranking Algorithms for Named Entity Extraction: Boosting and the VotedPerceptron , 2002, ACL.

[18]  Aravind K. Joshi,et al.  Incremental LTAG Parsing , 2005, HLT/EMNLP.

[19]  Michael Kearns,et al.  Efficient noise-tolerant learning from statistical queries , 1993, STOC.

[20]  Massimiliano Ciaramita,et al.  Supersense Tagging of Unknown Nouns in WordNet , 2003, EMNLP.

[21]  Michael Collins,et al.  Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[22]  Albert B Novikoff,et al.  ON CONVERGENCE PROOFS FOR PERCEPTRONS , 1963 .

[23]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[24]  Eyal Beigman,et al.  Analyzing Disagreements , 2008, COLING 2008.

[25]  Brian Roark,et al.  Incremental Parsing with the Perceptron Algorithm , 2004, ACL.

[26]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[27]  Ivan Titov,et al.  Data-Defined Kernels for Parse Reranking Derived from Probabilistic Models , 2005, ACL.

[28]  Paul A. Viola,et al.  Learning to extract information from semi-structured text using a discriminative context free grammar , 2005, SIGIR '05.

[29]  W. Feller,et al.  An Introduction to Probability Theory and Its Application. , 1951 .

[30]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[31]  Tom M. Mitchell,et al.  Learning to Classify Email into “Speech Acts” , 2004, EMNLP.

[32]  Dennis Reidsma,et al.  Exploiting ‘Subjective’ Annotations , 2008, COLING 2008.

[33]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .