Improving Multiclass Classification in Crowdsourcing by Using Hierarchical Schemes

In this paper, we propose a method of improving accuracy of multiclass classification tasks in crowdsourcing. In crowdsourcing, it is important to assign appropriate workers to appropriate tasks. In multiclass classification, different workers are good at different subcategories. In our method, we reorganize a given flat classification task into a hierarchical classification task consisting of several subtasks, and assign each worker to an appropriate subtask. In this approach, it is important to choose a good hierarchy. In our method, we first post a flat classification task with a part of data and collect statistics on each worker's ability. Based on the obtained statistics, we simulate all candidate hierarchical schemes, estimate their expected accuracy, choose the best scheme, and post it with the rest of data. In our method, it is also important to allocate workers to appropriate subtasks. We designed several greedy worker allocation algorithms. The results of our experiments show that our method improves the accuracy of multiclass classification tasks.

[1]  Harold W. Kuhn,et al.  The Hungarian method for the assignment problem , 1955, 50 Years of Integer Programming.

[2]  Pramod K. Varshney,et al.  Reliable Crowdsourcing for Multi-Class Labeling Using Coding Theory , 2013, IEEE Journal of Selected Topics in Signal Processing.

[3]  Mausam,et al.  Crowdsourcing Multi-Label Classification for Taxonomy Creation , 2013, HCOMP.

[4]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[5]  Joydeep Ghosh,et al.  Hierarchical Fusion of Multiple Classifiers for Hyperspectral Data Analysis , 2002, Pattern Analysis & Applications.

[6]  Hisashi Kashima,et al.  Quality Control for Crowdsourced Hierarchical Classification , 2015, 2015 IEEE International Conference on Data Mining.

[7]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[8]  Gang Chen,et al.  An online cost sensitive decision-making method in crowdsourcing systems , 2013, SIGMOD '13.

[9]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[10]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[11]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[12]  David Gross-Amblard,et al.  Using Hierarchical Skills for Optimized Task Assignment in Knowledge-Intensive Crowdsourcing , 2016, WWW.

[13]  Eric Horvitz,et al.  Planning for Crowdsourcing Hierarchical Tasks , 2015, AAMAS.

[14]  Devavrat Shah,et al.  Efficient crowdsourcing for multi-class labeling , 2013, SIGMETRICS '13.

[15]  Camille Couprie,et al.  Learning Hierarchical Features for Scene Labeling , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[16]  Lydia B. Chilton,et al.  Cascade: crowdsourcing taxonomy creation , 2013, CHI.

[17]  Guoliang Li,et al.  Crowdsourced Selection on Multi-Attribute Data , 2017, CIKM.