A Two-stage Iterative Approach to Improve Crowdsourcing-Based Relevance Assessment

Crowdsourcing has emerged as a viable platform to implement the relevance assessment of information retrieval. However, since crowdsourcing touches independent and anonymous workers, the quality control on assessment results has become a hotspot within academia and industry. For this problem, we propose a two-stage iterative approach by integrating the ensemble classifier and expert guidance. Specifically, in first stage an ensemble classifier is employed to select unreliable assessment objects for experts to validate. Then in second stage, the expectation maximization is utilized to update all assessment results in terms of the validation feedback. This loop continues until the cost limit is reached. Simulation experiment demonstrates that compared with existing solutions, our approach can eliminate more noise and thereby achieve a higher accuracy, while maintaining an acceptable running time and a low labor cost.

[1]  Jaana Kekäläinen,et al.  IR evaluation methods for retrieving highly relevant documents , 2000, SIGIR Forum.

[2]  Lale Akarun,et al.  Modeling annotator behaviors for crowd labeling , 2015, Neurocomputing.

[3]  Francisco Herrera,et al.  Tackling the problem of classification with noisy data using Multiple Classifier Systems: Analysis of the performance and robustness , 2013, Inf. Sci..

[4]  Xindong Wu,et al.  Active Learning With Imbalanced Multiple Noisy Labeling , 2015, IEEE Transactions on Cybernetics.

[5]  David R. Karger,et al.  Counting with the Crowd , 2012, Proc. VLDB Endow..

[6]  Taghi M. Khoshgoftaar,et al.  Improving Software Quality Prediction by Noise Filtering Techniques , 2007, Journal of Computer Science and Technology.

[7]  Lei Chen,et al.  Whom to Ask? Jury Selection for Decision Making Tasks on Micro-blog Services , 2012, Proc. VLDB Endow..

[8]  Francisco Herrera,et al.  INFFC: An iterative class noise filter based on the fusion of classifiers with noise sensitivity control , 2016, Inf. Fusion.

[9]  Karl Aberer,et al.  Minimizing Efforts in Validating Crowd Answers , 2015, SIGMOD Conference.

[10]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[11]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[12]  Wilfred Ng,et al.  Crowd-Selection Query Processing in Crowdsourcing Databases: A Task-Driven Approach , 2015, EDBT.

[13]  María José del Jesús,et al.  KEEL: a software tool to assess evolutionary algorithms for data mining problems , 2008, Soft Comput..

[14]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[15]  Elisa Bertino,et al.  Quality Control in Crowdsourcing Systems: Issues and Directions , 2013, IEEE Internet Computing.

[16]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[17]  Chien-Ju Ho,et al.  Adaptive Task Assignment for Crowdsourced Classification , 2013, ICML.

[18]  Donald B. Rubin,et al.  A Note on Bayesian, Likelihood, and Sampling Distribution Inferences , 1978 .

[19]  Beng Chin Ooi,et al.  iCrowd: An Adaptive Crowdsourcing Framework , 2015, SIGMOD Conference.

[20]  Jennifer Widom,et al.  Towards Globally Optimal Crowdsourcing Quality Management: The Uniform Worker Setting , 2016, SIGMOD Conference.

[21]  Shipeng Yu,et al.  Eliminating Spammers and Ranking Annotators for Crowdsourced Labeling Tasks , 2012, J. Mach. Learn. Res..

[22]  Karl Aberer,et al.  ERICA: Expert Guidance in Validating Crowd Answers , 2015, SIGIR.

[23]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[24]  Leigh-Anne McDuffus,et al.  Crowdsourcing for translational research: analysis of biomarker expression using cancer microarrays , 2016, British Journal of Cancer.

[25]  Matthew Lease,et al.  Crowdsourcing for information retrieval: principles, methods, and applications , 2011, SIGIR.

[26]  Francisco Herrera,et al.  Using the One-vs-One decomposition to improve the performance of class noise filters via an aggregation strategy in multi-class classification problems , 2015, Knowl. Based Syst..

[27]  Trevor Hastie,et al.  Overview of Supervised Learning , 2001 .

[28]  Ricardo Baeza-Yates,et al.  Design and Implementation of Relevance Assessments Using Crowdsourcing , 2011, ECIR.

[29]  Nada Lavrac,et al.  Experiments with Noise Filtering in a Medical Domain , 1999, ICML.

[30]  Jingjing Li,et al.  Learning from crowds with active learning and self-healing , 2017, Neural Computing and Applications.

[31]  José Luis Vicedo González,et al.  TREC: Experiment and evaluation in information retrieval , 2007, J. Assoc. Inf. Sci. Technol..

[32]  Steven J. M. Jones,et al.  CIViC is a community knowledgebase for expert crowdsourcing the clinical interpretation of variants in cancer , 2017, Nature Genetics.

[33]  Omar Alonso,et al.  Using crowdsourcing for TREC relevance assessment , 2012, Inf. Process. Manag..

[34]  Guoliang Li,et al.  Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[35]  Gianluca Demartini,et al.  Pick-a-crowd: tell me what you like, and i'll tell you what to do , 2013, CIDR.

[36]  Jennifer G. Dy,et al.  Active Learning from Multiple Knowledge Sources , 2012, AISTATS.

[37]  Yang Zhang,et al.  Integrating Active Learning with Supervision for Crowdsourcing Generalization , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[38]  HerreraFrancisco,et al.  Using the One-vs-One decomposition to improve the performance of class noise filters via an aggregation strategy in multi-class classification problems , 2015 .

[39]  Jens Lehmann,et al.  ACRyLIQ: Leveraging DBpedia for Adaptive Crowdsourcing in Linked Data Quality Assessment , 2016, EKAW.

[40]  Omar Alonso,et al.  Implementing crowdsourcing-based relevance experimentation: an industrial perspective , 2013, Information Retrieval.

[41]  Omar Alonso,et al.  Crowdsourcing for relevance evaluation , 2008, SIGF.

[42]  Francisco Herrera,et al.  The NoiseFiltersR Package: Label Noise Preprocessing in R , 2017, R J..

[43]  A. Ghezzi,et al.  Crowdsourcing: A Review and Suggestions for Future Research , 2018 .

[44]  Stan Matwin,et al.  Ensembles of label noise filters: a ranking approach , 2016, Data Mining and Knowledge Discovery.

[45]  Reynold Cheng,et al.  QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications , 2015, SIGMOD Conference.

[46]  Carla E. Brodley,et al.  Identifying Mislabeled Training Data , 1999, J. Artif. Intell. Res..

[47]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[48]  Victor S. Sheng,et al.  Noise filtering to improve data and model quality for crowdsourcing , 2016, Knowl. Based Syst..

[49]  Falk Scholer,et al.  On Crowdsourcing Relevance Magnitudes for Information Retrieval Evaluation , 2017, ACM Trans. Inf. Syst..

[50]  Reynold Cheng,et al.  On Optimality of Jury Selection in Crowdsourcing , 2015, EDBT.

[51]  Sibo Wang,et al.  Crowd-Based Deduplication: An Adaptive Approach , 2015, SIGMOD Conference.

[52]  Matthew Lease,et al.  Combining Crowd and Expert Labels Using Decision Theoretic Active Learning , 2015, HCOMP.

[53]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[54]  D. W. Zimmerman,et al.  Relative Power of the Wilcoxon Test, the Friedman Test, and Repeated-Measures ANOVA on Ranks , 1993 .

[55]  Xuelong Li,et al.  DISC: Deep Image Saliency Computing via Progressive Representation Learning , 2015, IEEE Transactions on Neural Networks and Learning Systems.