Context-aware result inference in crowdsourcing

Abstract Many result inference methods have been proposed to address the quality-control problem in crowdsourcing. However, existing methods are ineffective for context-sensitive tasks ( CSTs ), e.g., handwriting recognition, translation, speech transcription, where context correlation within a task cannot be ignored for two reasons. Firstly, it is ineffective to crowdsource a whole CST (e.g., recognizing handwritten texts) and use task-level inference methods to infer the answer, because it is rather hard to correctly complete a whole complicated task. Secondly, although a CST is composed of a set of atomic subtasks (e.g., recognizing a handwritten word), it is unsuitable to split it into multiple subtasks and adopt a subtask-level inference algorithm to infer the result, because this will lose the context correlation (e.g., phrases) among subtasks and increase the difficulty to complete a task. Thus it calls for a new approach to handling CSTs . In this work, we study the result inference problem for CSTs and propose a context-aware inference algorithm. We design an inference algorithm by incorporating the context information. Furthermore, we introduce an iterative method to improve the quality. The results of experiments on real-world CSTs demonstrated the superiority of our approach compared with the state-of-the-art methods.

[1]  Jean Vanderdonckt,et al.  A computational framework for context-aware adaptation of user interfaces , 2013, IEEE 7th International Conference on Research Challenges in Information Science (RCIS).

[2]  Paul M. Baggenstoss A modified Baum-Welch algorithm for hidden Markov models with multiple observation spaces , 2001, IEEE Trans. Speech Audio Process..

[3]  Yoram Bachrach,et al.  Hotspotting - A Probabilistic Graphical Model For Image Object Localization Through Crowdsourcing , 2013, AAAI.

[4]  Hailong Sun,et al.  Effective Result Inference for Context-Sensitive Tasks in Crowdsourcing , 2016, DASFAA.

[5]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[6]  Richard A. Robb,et al.  Biomedical Imaging, Visualization, and Analysis , 1999 .

[7]  Yang Du,et al.  A General Fine-Grained Truth Discovery Approach for Crowdsourced Data Aggregation , 2017, DASFAA.

[8]  Hailong Sun,et al.  Incorporating External Knowledge into Crowd Intelligence for More Specific Knowledge Acquisition , 2016, IJCAI.

[9]  Aditya G. Parameswaran,et al.  Challenges in Data Crowdsourcing , 2016, IEEE Transactions on Knowledge and Data Engineering.

[10]  Koby Crammer,et al.  Sequence Learning from Data with Multiple Labels , 2009 .

[11]  Reynold Cheng,et al.  DOCS: a domain-aware crowdsourcing system using knowledge bases , 2016, VLDB 2016.

[12]  Spyros Sioutas,et al.  Early prediction in collective intelligence on video users' activity , 2015, Inf. Sci..

[13]  Heng Ji,et al.  FaitCrowd: Fine Grained Truth Discovery for Crowdsourced Data Aggregation , 2015, KDD.

[14]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[15]  Reynold Cheng,et al.  QASCA: A Quality-Aware Task Assignment System for Crowdsourcing Applications , 2015, SIGMOD Conference.

[16]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[17]  Sarvapali D. Ramchurn,et al.  Crowdsourcing Complex Workflows under Budget Constraints , 2015, AAAI.

[18]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[19]  Kate Starbird,et al.  Delivering patients to sacré coeur: collective intelligence in digital volunteer communities , 2013, CHI.

[20]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[21]  Guoliang Li,et al.  Incremental Quality Inference in Crowdsourcing , 2014, DASFAA.

[22]  Gregory D. Abowd,et al.  A Conceptual Framework and a Toolkit for Supporting the Rapid Prototyping of Context-Aware Applications , 2001, Hum. Comput. Interact..

[23]  Guoliang Li,et al.  Crowdsourced Data Management: A Survey , 2016, IEEE Transactions on Knowledge and Data Engineering.

[24]  Daniel Deutch,et al.  On Provenance Minimization , 2012 .

[25]  Horst Bunke,et al.  A full English sentence database for off-line handwriting recognition , 1999, Proceedings of the Fifth International Conference on Document Analysis and Recognition. ICDAR '99 (Cat. No.PR00318).

[26]  David Garlan,et al.  Context is key , 2005, CACM.

[27]  Alex Waibel,et al.  Readings in speech recognition , 1990 .

[28]  S. Eddy Hidden Markov models. , 1996, Current opinion in structural biology.

[29]  Xindong Wu,et al.  Learning from crowdsourced labeled data: a survey , 2016, Artificial Intelligence Review.

[30]  Panagiotis G. Ipeirotis,et al.  Quizz: targeted crowdsourcing with a billion (potential) users , 2014, WWW.

[31]  Yves Normandin Maximum Mutual Information Estimation of Hidden Markov Models , 1996 .

[32]  Peng Dai,et al.  POMDP-based control of workflows for crowdsourcing , 2013, Artif. Intell..

[33]  Horst Bunke,et al.  The IAM-database: an English sentence database for offline handwriting recognition , 2002, International Journal on Document Analysis and Recognition.

[34]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[35]  Victor S. Sheng,et al.  Consensus algorithms for biased labeling in crowdsourcing , 2017, Inf. Sci..

[36]  Anind K. Dey,et al.  Understanding and Using Context , 2001, Personal and Ubiquitous Computing.

[37]  Hung Keng Pung,et al.  A middleware for building context-aware mobile services , 2004, 2004 IEEE 59th Vehicular Technology Conference. VTC 2004-Spring (IEEE Cat. No.04CH37514).

[38]  Jian Peng,et al.  Variational Inference for Crowdsourcing , 2012, NIPS.

[39]  Tingting Mu,et al.  Context-Aware and Energy-Driven Route Optimization for Fully Electric Vehicles via Crowdsourcing , 2013, IEEE Transactions on Intelligent Transportation Systems.

[40]  Jiawei Han,et al.  A probabilistic model for linking named entities in web text with heterogeneous information networks , 2014, SIGMOD Conference.

[41]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[42]  Guoliang Li,et al.  Truth Inference in Crowdsourcing: Is the Problem Solved? , 2017, Proc. VLDB Endow..

[43]  Beng Chin Ooi,et al.  CDAS: A Crowdsourcing Data Analytics System , 2012, Proc. VLDB Endow..

[44]  Hailong Sun,et al.  Improving the Quality of Crowdsourced Image Labeling via Label Similarity , 2017, Journal of Computer Science and Technology.

[45]  Anirban Dasgupta,et al.  Crowdsourced judgement elicitation with endogenous proficiency , 2013, WWW.

[46]  Maxine Eskénazi,et al.  Toward better crowdsourced transcription: Transcription of a year of the Let's Go Bus Information System data , 2010, 2010 IEEE Spoken Language Technology Workshop.

[47]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[48]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[49]  Chris Callison-Burch,et al.  Crowdsourcing Translation: Professional Quality from Non-Professionals , 2011, ACL.