Predicting result quality in Crowdsourcing using application layer monitoring

Crowdsourcing has become a valuable tool for many business applications requiring to meet a certain quality of the results generated by the workers. Therefore, several quality assurance mechanisms have been developed which are partly deployed in commercial crowdsourcing platforms. However, these mechanisms usually impose additional work overhead for the worker, e.g. by adding test questions, or increase the costs for the employer, e.g. by replicating the task for majority decisions. In this work, we analyze the applicability of implicit measurements to objectively estimate the quality of the workers' results. First efforts in this area have already been made by investigating the impact of the task completion time. We extend this research by deploying an application layer monitoring (ALM), which enables monitoring the workers' interactions with our task interface on a much more detailed level. Based on an exemplary use case, we discuss a possible implementation and demonstrate the potential of the approach by predicting the quality of the workers' submission based on our monitoring results. This ALM provides a new way to identify low quality work as well as difficulties in fulfilling the formulated tasks in the domain of Crowdsourcing.

[1]  Ben Carterette,et al.  An Analysis of Assessor Behavior in Crowdsourced Preference Judgments , 2010 .

[2]  Phuoc Tran-Gia,et al.  Crowdsourcing and its Impact on Future Internet Usage , 2013, it Inf. Technol..

[3]  Phuoc Tran-Gia,et al.  Modeling of crowdsourcing platforms and granularity of work organization in Future Internet , 2011, 2011 23rd International Teletraffic Congress (ITC).

[4]  Gianluca Demartini,et al.  Mechanical Cheat: Spamming Schemes and Adversarial Techniques on Crowdsourcing Platforms , 2012, CrowdSearch.

[5]  Daniel G. Goldstein,et al.  Honesty in an Online Labor Market , 2011, Human Computation.

[6]  Panagiotis G. Ipeirotis,et al.  Get another label? improving data quality and data mining using multiple, noisy labelers , 2008, KDD.

[7]  Cha Zhang,et al.  CROWDMOS: An approach for crowdsourcing mean opinion score studies , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Richard R. Day,et al.  Developing Reading Comprehension Questions. , 2005 .

[9]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[10]  Phuoc Tran-Gia,et al.  Quantification of YouTube QoE via Crowdsourcing , 2011, 2011 IEEE International Symposium on Multimedia.

[11]  Lydia B. Chilton,et al.  Exploring iterative and parallel human computation processes , 2010, HCOMP '10.

[12]  James Davis,et al.  Evaluating and improving the usability of Mechanical Turk for low-income workers in India , 2010, ACM DEV '10.

[13]  Siddharth Suri,et al.  Conducting behavioral research on Amazon’s Mechanical Turk , 2010, Behavior research methods.