Strategies for crowdsourcing social data analysis

Web-based social data analysis tools that rely on public discussion to produce hypotheses or explanations of the patterns and trends in data, rarely yield high-quality results in practice. Crowdsourcing offers an alternative approach in which an analyst pays workers to generate such explanations. Yet, asking workers with varying skills, backgrounds and motivations to simply "Explain why a chart is interesting" can result in irrelevant, unclear or speculative explanations of variable quality. To address these problems, we contribute seven strategies for improving the quality and diversity of worker-generated explanations. Our experiments show that using (S1) feature-oriented prompts, providing (S2) good examples, and including (S3) reference gathering, (S4) chart reading, and (S5) annotation subtasks increases the quality of responses by 28% for US workers and 196% for non-US workers. Feature-oriented prompts improve explanation quality by 69% to 236% depending on the prompt. We also show that (S6) pre-annotating charts can focus workers' attention on relevant details, and demonstrate that (S7) generating explanations iteratively increases explanation diversity without increasing worker attrition. We used our techniques to generate 910 explanations for 16 datasets, and found that 63% were of high quality. These results demonstrate that paid crowd workers can reliably generate diverse, high-quality explanations that support the analysis of specific datasets.

[1]  Aaron D. Shaw,et al.  Designing incentives for inexpert human raters , 2011, CSCW.

[2]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[3]  James D. Hollan,et al.  Deixis and the future of visualization excellence , 1991, Proceeding Visualization '91.

[4]  Eun-Kyung Lee,et al.  Projection Pursuit for Exploratory Supervised Classification , 2005 .

[5]  Alexis Battle,et al.  The jabberwocky programming environment for structured social computing , 2011, UIST.

[6]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[7]  Duncan J. Watts,et al.  Financial incentives and the "performance of crowds" , 2009, HCOMP '09.

[8]  Jeffrey Heer,et al.  Perceptual Guidelines for Creating Rectangular Treemaps , 2010, IEEE Transactions on Visualization and Computer Graphics.

[9]  Paul Johns,et al.  Pathfinder: an online collaboration environment for citizen scientists , 2009, CHI.

[10]  Panagiotis G. Ipeirotis Demographics of Mechanical Turk , 2010 .

[11]  Benjamin B. Bederson,et al.  Human computation: a survey and taxonomy of a growing field , 2011, CHI.

[12]  Jeffrey Heer,et al.  Design Considerations for Collaborative Visual Analytics , 2008, Inf. Vis..

[13]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[14]  David Boud,et al.  Enhancing learning through self assessment , 1995 .

[15]  David A. Forsyth,et al.  Utility data annotation with Amazon Mechanical Turk , 2008, 2008 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[16]  Stuart K. Card,et al.  The cost structure of sensemaking , 1993, INTERCHI.

[17]  Lukas Biewald,et al.  Programmatic Gold: Targeted and Scalable Quality Assurance in Crowdsourcing , 2011, Human Computation.

[18]  Dana Chandler,et al.  Breaking Monotony with Meaning: Motivation in Crowdsourcing Markets , 2012, ArXiv.

[19]  Loren Terveen,et al.  The Dynamics of Mass Interaction , 2003, From Usenet to CoWebs.

[20]  P. Pirolli,et al.  The Sensemaking Process and Leverage Points for Analyst Technology as Identified Through Cognitive Task Analysis , 2007 .

[21]  Martin Wattenberg,et al.  Harry Potter and the Meat-Filled Freezer: A Case Study of Spontaneous Usage  of Visualization Tools , 2008, Proceedings of the 41st Annual Hawaii International Conference on System Sciences (HICSS 2008).

[22]  Martin Wattenberg,et al.  ManyEyes: a Site for Visualization at Internet Scale , 2007, IEEE Transactions on Visualization and Computer Graphics.

[23]  Martin Wattenberg,et al.  Your place or mine?: visualization as a community component , 2008, CHI.

[24]  Martin Wattenberg,et al.  Voyagers and voyeurs: Supporting asynchronous collaborative visualization , 2009, CACM.

[25]  Aniket Kittur,et al.  Crowdsourcing user studies with Mechanical Turk , 2008, CHI.

[26]  Catherine B. Hurley,et al.  Pairwise Display of High-Dimensional Information via Eulerian Tours and Hamiltonian Decompositions , 2010 .

[27]  Jeffrey Heer,et al.  Crowdsourcing graphical perception: using mechanical turk to assess visualization design , 2010, CHI.

[28]  Leland Wilkinson,et al.  AutoVis: Automatic Visualization , 2010, Inf. Vis..

[29]  Lydia B. Chilton,et al.  TurKit: Tools for iterative tasks on mechanical turk , 2009, 2009 IEEE Symposium on Visual Languages and Human-Centric Computing (VL/HCC).

[30]  Jeffrey Heer,et al.  CommentSpace: structured support for collaborative visual analysis , 2011, CHI.