Quality Control for Crowdsourced Hierarchical Classification

Repeated labeling is a widely adopted quality control method in crowdsourcing. This method is based on selecting one reliable label from multiple labels collected by workers because a single label from only one worker has a wide variance of accuracy. Hierarchical classification, where each class has a hierarchical relationship, is a typical task in crowdsourcing. However, direct applications of existing methods designed for multi-class classification have the disadvantage of discriminating among a large number of classes. In this paper, we propose a label aggregation method for hierarchical classification tasks. Our method takes the hierarchical structure into account to handle a large number of classes and estimate worker abilities more precisely. Our method is inspired by the steps model based on item response theory, which models responses of examinees to sequentially dependent questions. We considered hierarchical classification to be a question consisting of a sequence of subquestions and built a worker response model for hierarchical classification. We conducted experiments using real crowdsourced hierarchical classification tasks and demonstrated the benefit of incorporating a hierarchical structure to improve the label aggregation accuracy.

[1]  Qiang Yang,et al.  Cross-task crowdsourcing , 2013, KDD.

[2]  Mausam,et al.  Crowdsourcing Multi-Label Classification for Taxonomy Creation , 2013, HCOMP.

[3]  C. Lintott,et al.  Galaxy Zoo 2: detailed morphological classifications for 304,122 galaxies from the Sloan Digital Sky Survey , 2013, 1308.3496.

[4]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[5]  Yong Yu,et al.  Sembler: Ensembling Crowd Sequential Labeling for Improved Quality , 2012, AAAI.

[6]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[7]  Cornelis A.W. Glas,et al.  A Steps Model to Analyze Partial Credit , 1997 .

[8]  Adrien Treuille,et al.  Predicting protein structures with a multiplayer online game , 2010, Nature.

[9]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[10]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[11]  Guillaume Bouchard,et al.  Error Prediction with Partial Feedback , 2013, ECML/PKDD.

[12]  Lei Duan,et al.  Separate or joint? Estimation of multiple labels from crowdsourced annotations , 2014, Expert Syst. Appl..

[13]  Eric Horvitz,et al.  Planning for Crowdsourcing Hierarchical Tasks , 2015, AAMAS.

[14]  Mausam,et al.  Crowdsourcing Control: Moving Beyond Multiple Choice , 2012, UAI.

[15]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[16]  Gianluca Demartini,et al.  Large-scale linked data integration using probabilistic reasoning and crowdsourcing , 2013, The VLDB Journal.

[17]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[18]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[19]  Paul N. Bennett,et al.  Pairwise ranking aggregation in a crowdsourced setting , 2013, WSDM.

[20]  Jinfeng Yi,et al.  Inferring Users' Preferences from Crowdsourced Pairwise Comparisons: A Matrix Completion Approach , 2013, HCOMP.