An Iterative and Re-weighting Framework for Rejection and Uncertainty Resolution in Crowdsourcing

In practical applications of crowdsourcing, labelers may be uncertain or refuse to label a particular instance (or reject) due to the inherent difficulty, and each labeler may be given a different set of instances for big dataset applications. These various issues lead to missing and uncertain labels. Existing crowdsourcing methods have limited capabilities when these two problems exist. In this paper, we propose an Iterative Re-weighted Consensus Maximization framework to address the missing and uncertain label problem. The intuitive idea is to use an iterated framework to estimate each labeler’s hidden competence and formulate it as a spectral clustering problem in the functional space, in order to minimize the overall loss given missing and uncertain information. One main advantage of the proposed method from stateof-the-art Bayesian model averaging based approaches is that it uncovers the intrinsic consistency among different set of answers and mines the best possible ground truth. Formal analysis demonstrates that the proposed framework has lower generalization error than widely adopted majority voting techniques for crowdsourcing. Experimental studies show that the proposed framework outperforms state-of-the-art baselines on several benchmark datasets.

[1]  Carla E. Brodley,et al.  Who Should Label What? Instance Allocation in Multiple Expert Active Learning , 2011, SDM.

[2]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[3]  Jaime G. Carbonell,et al.  Proactive learning: cost-sensitive active learning with multiple imperfect oracles , 2008, CIKM '08.

[4]  Zoubin Ghahramani,et al.  Combining active learning and semi-supervised learning using Gaussian fields and harmonic functions , 2003, ICML 2003.

[5]  Bernhard Schölkopf,et al.  Learning with Local and Global Consistency , 2003, NIPS.

[6]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[7]  Joydeep Ghosh,et al.  Cluster Ensembles --- A Knowledge Reuse Framework for Combining Multiple Partitions , 2002, J. Mach. Learn. Res..

[8]  Rich Caruana,et al.  Consensus Clusterings , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[9]  Yizhou Sun,et al.  Graph-based Consensus Maximization among Multiple Supervised and Unsupervised Models , 2009, NIPS.

[10]  Feiping Nie,et al.  Consensus spectral clustering in near-linear time , 2011, 2011 IEEE 27th International Conference on Data Engineering.

[11]  Carla E. Brodley,et al.  Solving cluster ensemble problems by bipartite graph partitioning , 2004, ICML.

[12]  Jennifer G. Dy,et al.  Modeling Multiple Annotator Expertise in the Semi-Supervised Learning Scenario , 2010, UAI.

[13]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[14]  Robert E. Kass,et al.  Importance sampling: a review , 2010 .

[15]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[16]  Nicolas Chapados,et al.  Extensions to Metric-Based Model Selection , 2003, J. Mach. Learn. Res..

[17]  Mark W. Schmidt,et al.  Modeling annotator expertise: Learning when everybody knows a bit of something , 2010, AISTATS.