A Quality-Sensitive Method for Learning from Crowds

In real-world applications, the oracle who can label all instances correctly may not exist or may be too expensive to acquire. Alternatively, crowdsourcing provides an easy way to get labels at a low cost from multiple non-expert annotators. During the past few years, much attention has been paid to learning from such crowdsourcing data, namely Learning from Crowds (LFC). Despite their proper statistical foundations, the existing methods for LFC still suffer from several disadvantages, such as needing prior knowledge to select the expertise model to represent the behavior of annotators, involving non-convex optimization problems, or restricting the classifier type being used. This paper addresses LFC from a quality-sensitive perspective and presents a novel framework named QS-LFC. Through reformulating the original LFC problem as a quality-sensitive learning problem, the above-mentioned disadvantages of existing methods can be avoided. Further, a support vector machine (SVM) implementation of QS-LFC is proposed. Experimental results on both synthetic and real-world data sets demonstrate that QS-LFC can achieve better generalization performance and is more robust to the noisy labels, than the existing methods.

[1]  Lu Wang,et al.  Cost-Saving Effect of Crowdsourcing Learning , 2016, IJCAI.

[2]  Hiroshi Kajino,et al.  Convex Formulations of Learning from Crowds , 2012 .

[3]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[4]  Tian Tian,et al.  Max-Margin Majority Voting for Learning from Crowds , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[5]  Rui Wang,et al.  Towards social user profiling: unified and discriminative influence model for inferring home locations , 2012, KDD.

[6]  Gerardo Hermosillo,et al.  Learning From Crowds , 2010, J. Mach. Learn. Res..

[7]  Yu Zhang,et al.  An Active Learning Approach for Jointly Estimating Worker Performance and Annotation Reliability with Crowdsourced Data , 2014, ArXiv.

[8]  Michael Vitale,et al.  The Wisdom of Crowds , 2015, Cell.

[9]  Yang Liu,et al.  A probabilistic model of active learning with multiple noisy oracles , 2013, Neurocomputing.

[10]  John C. Platt,et al.  Learning from the Wisdom of Crowds by Minimax Entropy , 2012, NIPS.

[11]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[12]  Hisashi Kashima,et al.  Clustering Crowds , 2013, AAAI.

[13]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[14]  Mark P. Silverman,et al.  The Wisdom of Crowds. .The Wisdom of CrowdsJamesSurowiecki . 306 pp. Random House, New York, 2004. $24.95 (cloth) ISBN 0-385-50386-5; $14.00 (paper) ISBN 0-385-72170-6. , 2007 .

[15]  Zhuowen Tu,et al.  Learning to Predict from Crowdsourced Data , 2014, UAI.

[16]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[17]  Hisashi Kashima,et al.  A Convex Formulation for Learning from Crowds , 2012, AAAI.

[18]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[19]  Jennifer G. Dy,et al.  Active Learning from Crowds , 2011, ICML.

[20]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[21]  Zhi-Hua Zhou,et al.  Active Learning from Crowds with Unsure Option , 2015, IJCAI.

[22]  Milos Hauskrecht,et al.  Learning classification models from multiple experts , 2013, J. Biomed. Informatics.

[23]  Victor S. Sheng Simple Multiple Noisy Label Utilization Strategies , 2011, 2011 IEEE 11th International Conference on Data Mining.

[24]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[25]  Oren Etzioni,et al.  Named Entity Recognition in Tweets: An Experimental Study , 2011, EMNLP.

[26]  Mark Dredze,et al.  Annotating Named Entities in Twitter Data with Crowdsourcing , 2010, Mturk@HLT-NAACL.

[27]  Abhimanu Kumar Modeling Annotator Accuracies for Supervised Learning , 2011 .

[28]  Faiza Khan Khattak Quality Control of Crowd Labeling through Expert Evaluation , 2011 .

[29]  Jaime G. Carbonell,et al.  Efficiently learning the accuracy of labeling sources for selective sampling , 2009, KDD.

[30]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.

[31]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .