Toward Optimal Labeling Strategy under Multiple Unreliable Labelers

One of the most resource intensive tasks in building a pattern recognition system is data collection, specifically the acquisition of sample labels from subject experts. The first part of this paper explores an EM algorithm to train classifiers using labelers of various reliability. Exploiting unreliable labelers opens up the possibility of assigning multiple labelers to judge the same sample. The second part of this paper examines an optimal strategy such that labelers are assigned to judge samples to maximize information given to the learning system. The optimal labeling strategy for the idealized case of two labelers with two samples is examined and illustrated.

[1]  Sebastian Thrun,et al.  Text Classification from Labeled and Unlabeled Documents using EM , 2000, Machine Learning.

[2]  D. Angluin,et al.  Learning From Noisy Examples , 1988, Machine Learning.

[3]  Don Blaheta Handling Noisy Training and Testing Data , 2002, EMNLP.

[4]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[5]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[6]  G. McLachlan,et al.  The EM algorithm and extensions , 1996 .

[7]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[8]  T. Krishnan,et al.  Pattern recognition with an imperfect supervisor , 1989, Pattern Recognit..

[9]  Padhraic Smyth,et al.  Bounds on the mean classification error rate of multiple experts , 1996, Pattern Recognit. Lett..

[10]  Matthew Richardson,et al.  Learning with Knowledge from Multiple Experts , 2003, ICML.

[11]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[12]  David A. Cohn,et al.  Active Learning with Statistical Models , 1996, NIPS.

[13]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[14]  David M. Pennock,et al.  Extracting collective probabilistic forecasts from web games , 2001, KDD '01.

[15]  David G. Stork,et al.  Pattern Classification , 1973 .

[16]  Gábor Lugosi,et al.  Learning with an unreliable teacher , 1992, Pattern Recognit..

[17]  David G. Stork,et al.  Evaluating Classifiers by Means of Test Data with Noisy Labels , 2003, IJCAI.

[18]  David G. Stork,et al.  Using Open Data Collection for Intelligent Software , 2000, Computer.

[19]  T. Krishnan Efficiency of learning with imperfect supervision , 1988, Pattern Recognit..