Mini-crowdsourcing end-user assessment of intelligent assistants: A cost-benefit study

Intelligent assistants sometimes handle tasks too important to be trusted implicitly. End users can establish trust via systematic assessment, but such assessment is costly. This paper investigates whether, when, and how bringing a small crowd of end users to bear on the assessment of an intelligent assistant is useful from a cost/benefit perspective. Our results show that a mini-crowd of testers supplied many more benefits than the obvious decrease in workload, but these benefits did not scale linearly as mini-crowd size increased - there was a point of diminishing returns where the cost-benefit ratio became less attractive.

[1]  Desney S. Tan,et al.  EnsembleMatrix: interactive visualization to support machine learning with multiple classifiers , 2009, CHI.

[2]  Gregg Rothermel,et al.  A methodology for testing spreadsheets , 2001, TSEM.

[3]  Chris Callison-Burch,et al.  Creating Speech and Language Data With Amazon’s Mechanical Turk , 2010, Mturk@HLT-NAACL.

[4]  Ossi Taipale,et al.  Research Issues for Software Testing in the Cloud , 2010, 2010 IEEE Second International Conference on Cloud Computing Technology and Science.

[5]  Anind K. Dey,et al.  Why and why not explanations improve the intelligibility of context-aware intelligent systems , 2009, CHI.

[6]  Manuel Blum,et al.  reCAPTCHA: Human-Based Character Recognition via Web Security Measures , 2008, Science.

[7]  S. Hart,et al.  Development of NASA-TLX (Task Load Index): Results of Empirical and Theoretical Research , 1988 .

[8]  Margaret M. Burnett,et al.  Journal of Visual Languages & Computing Interactive, Visual Fault Localization Support for End-user Programmers This Paper Updates and Extends Earlier Work That Appeared In , 2022 .

[9]  Matthew Lease,et al.  Crowdsourcing Document Relevance Assessment with Mechanical Turk , 2010, Mturk@HLT-NAACL.

[10]  Weng-Keen Wong,et al.  Fixing the program my computer learned: barriers for end users, challenges for the machine , 2009, IUI.

[11]  Anind K. Dey,et al.  Toolkit to support intelligibility in context-aware applications , 2010, UbiComp.

[12]  Alex Groce,et al.  Where Are My Intelligent Assistant's Mistakes? A Systematic Testing Approach , 2011, IS-EUD.

[13]  Jaime G. Carbonell,et al.  Active learning and crowdsourcing for machine translation in low resource scenarios , 2012 .

[14]  Deborah L. McGuinness,et al.  Toward establishing trust in adaptive agents , 2008, IUI '08.

[15]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[16]  Boris Beizer,et al.  Software Testing Techniques , 1983 .

[17]  Weng-Keen Wong,et al.  Explanatory Debugging: Supporting End-User Debugging of Machine-Learned Programs , 2010, VL/HCC.

[18]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[19]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[20]  Jaime G. Carbonell,et al.  Active Learning and Crowd-Sourcing for Machine Translation , 2010, LREC.

[21]  K. Williams,et al.  Many Hands Make Light the Work: The Causes and Consequences of Social Loafing , 1979 .