Modeling Task Complexity in Crowdsourcing

Complexity is crucial to characterize tasks performed by humans through computer systems. Yet, the theory and practice of crowdsourcing currently lacks a clear understanding of task complexity, hindering the design of effective and efficient execution interfaces or fair monetary rewards.  To understand how complexity is perceived and distributed over crowdsourcing tasks, we instrumented an experiment where we asked workers to evaluate the complexity of 61 real-world re-instantiated crowdsourcing tasks. We show that task complexity, while being subjective, is coherently perceived across workers; on the other hand, it is significantly influenced by task type. Next, we develop a high-dimensional regression model, to assess the influence of three classes of structural features (metadata, content, and visual) on task complexity, and ultimately use them to measure task complexity. Results show that both the appearance and the language used in task description can accurately predict task complexity. Finally, we apply the same feature set to predict task performance, based on a set of 5 years-worth tasks in Amazon MTurk. Results show that features related to task complexity can improve the quality of task performance prediction, thus demonstrating the utility of complexity as a task modeling property.

[1]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[2]  D. Campbell Task Complexity: A Review and Analysis , 1988 .

[3]  Alessandro Bozzon,et al.  Asking the right question in collaborative q&a systems , 2014, HT.

[4]  Matthew Lease,et al.  A Discriminative Approach to Predicting Assessor Accuracy , 2015, ECIR.

[5]  Matthew Lease,et al.  Modeling Temporal Crowd Work Quality with Limited Supervision , 2015, HCOMP.

[6]  Stefan Dietze,et al.  A taxonomy of microtasks on the web , 2014, HT.

[7]  Martin Halvey,et al.  Evaluating the effort involved in relevance assessments for images , 2014, SIGIR.

[8]  Michael S. Bernstein,et al.  Break It Down: A Comparison of Macro- and Microtasks , 2015, CHI.

[9]  Milad Shokouhi,et al.  Community-based bayesian aggregation models for crowdsourcing , 2014, WWW.

[10]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[11]  Catherine C. Marshall,et al.  Crowdsourcing a Subjective Labeling Task: A Human-Centered Framework to Ensure Reliable Results , 2014 .

[12]  Sabine Süsstrunk,et al.  Measuring colorfulness in natural images , 2003, IS&T/SPIE Electronic Imaging.

[13]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[14]  R. Wood Task complexity: Definition of the construct , 1986 .

[15]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[16]  Sandra G. Hart,et al.  Nasa-Task Load Index (NASA-TLX); 20 Years Later , 2006 .

[17]  Phuoc Tran-Gia,et al.  Best Practices for QoE Crowdtesting: QoE Assessment With Crowdsourcing , 2014, IEEE Transactions on Multimedia.

[18]  Judith Redi,et al.  Crowdsourcing for Rating Image Aesthetic Appeal: Better a Paid or a Volunteer Crowd? , 2014, CrowdMM '14.

[19]  H. Sebastian Seung,et al.  Learning the parts of objects by non-negative matrix factorization , 1999, Nature.

[20]  Tobias Hoßfeld,et al.  SOS: The MOS is not enough! , 2011, 2011 Third International Workshop on Quality of Multimedia Experience.

[21]  Robert Stevens,et al.  Toward a definition of visual complexity as an implicit measure of cognitive load , 2009, TAP.

[22]  Marti A. Hearst,et al.  Empirically validated web page design metrics , 2001, CHI.

[23]  Luis von Ahn,et al.  Human computation , 2009, 2009 46th ACM/IEEE Design Automation Conference.

[24]  Constantinos K. Coursaris,et al.  An empirical investigation of color temperature and gender effects on web aesthetics , 2008 .

[25]  Michael S. Bernstein,et al.  Measuring Crowdsourcing Effort with Error-Time Curves , 2015, CHI.

[26]  Alessandro Bozzon,et al.  On the Improvement of Quality and Reliability of Trust Cues in Micro-task Crowdsourcing (Position paper) , 2017, ArXiv.

[27]  Tran-GiaPhuoc,et al.  Best Practices for QoE Crowdtesting , 2014 .

[28]  Eric Gilbert,et al.  Comparing Person- and Process-centric Strategies for Obtaining Quality Data on Amazon Mechanical Turk , 2015, CHI.

[29]  François Bry,et al.  Human computation , 2018, it Inf. Technol..

[30]  Panagiotis G. Ipeirotis,et al.  The Dynamics of Micro-Task Crowdsourcing: The Case of Amazon MTurk , 2015, WWW.

[31]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[32]  Arjen P. de Vries,et al.  Increasing cheat robustness of crowdsourcing tasks , 2013, Information Retrieval.

[33]  Niels Henze,et al.  Using Space: Effect of Display Size on Users' Search Performance , 2015, CHI Extended Abstracts.

[34]  Aniket Kittur,et al.  An Assessment of Intrinsic and Extrinsic Motivation on Task Performance in Crowdsourcing Markets , 2011, ICWSM.

[35]  Katharina Reinecke,et al.  Predicting users' first impressions of website aesthetics with a quantification of perceived visual complexity and colorfulness , 2013, CHI.

[36]  Jim Laredo,et al.  Challenges and Experiences in Deploying Enterprise Crowdsourcing Service , 2010, ICWE.