Simple Multiple Noisy Label Utilization Strategies

With the outsourcing of small tasks becoming easier, it is possible to obtain non-expert/imperfect labels at low cost. With low-cost imperfect labeling, it is straightforward to collect multiple labels for the same data items. This paper addresses the strategies of utilizing these multiple labels for improving the performance of supervised learning, based on two basic ideas: majority voting and pair wise solutions. We show several interesting results based on our experiments. The soft majority voting strategies can reduce the bias and roughness, and improve the performance of the directed hard majority voting strategy. Pair wise strategies can completely avoid the bias by having both sides (potential correct and incorrect/noisy information) considered (for binary classification). They have very good performance whenever there are a few or many labels available. However, it could also keep the noise. The improved variation that reduces the impact of the noisy information is recommended. All five strategies investigated are labeling quality agnostic strategies, and can be applied to real world applications directly. The experimental results show some of them perform better than or at least very close to the gnostic strategies.

[1]  Irving John Good,et al.  The Estimation of Probabilities: An Essay on Modern Bayesian Methods , 1965 .

[2]  Pietro Perona,et al.  Inferring Ground Truth from Subjective Labelling of Venus Images , 1994, NIPS.

[3]  A. Brix Bayesian Data Analysis, 2nd edn , 2005 .

[4]  Panagiotis G. Ipeirotis,et al.  Repeated labeling using multiple noisy labelers , 2012, Data Mining and Knowledge Discovery.

[5]  John Langford,et al.  Cost-sensitive learning by cost-proportionate example weighting , 2003, Third IEEE International Conference on Data Mining.

[6]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[7]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[8]  Laura A. Dabbish,et al.  Labeling images with a computer game , 2004, AAAI Spring Symposium: Knowledge Collection from Volunteer Contributors.

[9]  Bojan Cestnik,et al.  Estimating Probabilities: A Crucial Task in Machine Learning , 1990, ECAI.

[10]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[11]  Jiebo Luo,et al.  Learning multi-label scene classification , 2004, Pattern Recognit..

[12]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[13]  Bernard W. Silverman,et al.  Some asymptotic properties of the probabilistic teacher (Corresp.) , 1980, IEEE Trans. Inf. Theory.

[14]  Jaime G. Carbonell,et al.  Efficiently learning the accuracy of labeling sources for selective sampling , 2009, KDD.

[15]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[16]  Kai Ming Ting,et al.  An Instance-weighting Method to Induce Cost-sensitive Trees , 2001 .

[17]  Peter D. Turney Types of Cost in Inductive Concept Learning , 2002, ArXiv.

[18]  Rong Jin,et al.  Learning with Multiple Labels , 2002, NIPS.

[19]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[20]  Pietro Perona,et al.  Knowledge Discovery in Large Image Databases: Dealing with Uncertainties in Ground Truth , 1994, KDD Workshop.

[21]  Foster J. Provost,et al.  Learning When Training Data are Costly: The Effect of Class Distribution on Tree Induction , 2003, J. Artif. Intell. Res..

[22]  Gábor Lugosi,et al.  Learning with an unreliable teacher , 1992, Pattern Recognit..

[23]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[24]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[25]  Ian Witten,et al.  Data Mining , 2000 .

[26]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[27]  Padhraic Smyth,et al.  Bounds on the mean classification error rate of multiple experts , 1996, Pattern Recognit. Lett..

[28]  David B. Dunson,et al.  Bayesian Data Analysis , 2010 .

[29]  Zhiqiang Zheng,et al.  Selectively Acquiring Customer Information: A New Data Acquisition Problem and an Active Learning-Based Solution , 2006, Manag. Sci..