FourEyes

Crowdsourcing is a common means of collecting image segmentation training data for use in a variety of computer vision applications. However, designing accurate crowd-powered image segmentation systems is challenging, because defining object boundaries in an image requires significant fine motor skills and hand-eye coordination, which makes these tasks error-prone. Typically, special segmentation tools are created and then answers from multiple workers are aggregated to generate more accurate results. However, individual tool designs can bias how and where people make mistakes, resulting in shared errors that remain even after aggregation. In this article, we introduce a novel crowdsourcing approach that leverages tool diversity as a means of improving aggregate crowd performance. Our idea is that given a diverse set of tools, answer aggregation done across tools can help improve the collective performance by offsetting systematic biases induced by the individual tools themselves. To demonstrate the effectiveness of the proposed approach, we design four different tools and present FourEyes, a crowd-powered image segmentation system that uses aggregation across different tools. We then conduct a series of studies that evaluate different aggregation conditions and show that using multiple tools can significantly improve aggregate accuracy. Furthermore, we investigate the idea of applying post-processing for multi-tool aggregation in terms of correction mechanism. We introduce a novel region-based method for synthesizing more accurate bounds for image segmentation tasks through averaging surrounding annotations. In addition, we explore the effect of adjusting the threshold parameter of an EM-based aggregation method. Our results suggest that not only the individual tool’s design, but also the correction mechanism, can affect the performance of multi-tool aggregation. This article extends a work presented at ACM IUI 2018 [46] by providing a novel region-based error-correction method and additional in-depth evaluation of the proposed approach.

[1]  Harmanpreet Kaur,et al.  Plexiglass: Multiplexing Passive and Active Tasks for More Efficient Crowdsourcing , 2018, HCOMP.

[2]  Mausam,et al.  Dynamically Switching between Synergistic Workflows for Crowdsourcing , 2012, AAAI.

[3]  Mausam,et al.  Crowdsourcing Multi-Label Classification for Taxonomy Creation , 2013, HCOMP.

[4]  Margrit Betke,et al.  Investigating the Influence of Data Familiarity to Improve the Design of a Crowdsourcing Image Annotation System , 2016, HCOMP.

[5]  Neil T. Heffernan,et al.  AXIS: Generating Explanations at Scale with Learnersourcing and Machine Learning , 2016, L@S.

[6]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[7]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[8]  Scott R. Klemmer,et al.  Shepherding the crowd yields better work , 2012, CSCW.

[9]  Björn Hartmann,et al.  Collaboratively crowdsourcing workflows with turkomatic , 2012, CSCW.

[10]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[11]  Panagiotis G. Ipeirotis,et al.  Quality management on Amazon Mechanical Turk , 2010, HCOMP '10.

[12]  Aniket Kittur,et al.  CrowdForge: crowdsourcing complex work , 2011, UIST.

[13]  Walter S. Lasecki,et al.  Real-time captioning by groups of non-experts , 2012, UIST.

[14]  Walter S. Lasecki,et al.  Self-correcting crowds , 2012, CHI EA '12.

[15]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[16]  J. Brigham,et al.  Thirty years of investigating the own-race bias in memory for faces: A meta-analytic review , 2001 .

[17]  Jaime Teevan,et al.  CrowdMask: Using Crowds to Preserve Privacy in Crowd-Powered Systems via Progressive Filtering , 2017, HCOMP.

[18]  Michael S. Bernstein,et al.  Mechanical Turk is Not Anonymous , 2013 .

[19]  Juho Kim,et al.  ConceptScape: Collaborative Concept Mapping for Video Learning , 2018, CHI.

[20]  Amaia Salvador,et al.  Click'n'Cut: Crowdsourced Interactive Segmentation with Object Candidates , 2014, CrowdMM '14.

[21]  Noah Snavely,et al.  OpenSurfaces , 2013, ACM Trans. Graph..

[22]  Rob Miller,et al.  Real-time crowd control of existing interfaces , 2011, UIST.

[23]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[24]  Mausam,et al.  Crowdsourcing Control: Moving Beyond Multiple Choice , 2012, UAI.

[25]  Aniket Kittur,et al.  Instrumenting the crowd: using implicit behavioral measures to predict task performance , 2011, UIST.

[26]  Danai Koutra,et al.  Glance: rapidly coding behavioral video with the crowd , 2014, UIST.

[27]  Wojciech Matusik,et al.  Crowd-Guided Ensembles: How Can We Choreograph Crowd Workers for Video Segmentation? , 2018, CHI.

[28]  Thomas P. Moran,et al.  Questions, Options, and Criteria: Elements of Design Space Analysis , 1991, Hum. Comput. Interact..

[29]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[30]  Eric Horvitz,et al.  Volunteering Versus Work for Pay: Incentives and Tradeoffs in Crowdsourcing , 2013, HCOMP.

[31]  Oleksandr Makeyev,et al.  Neural network with ensembles , 2010, The 2010 International Joint Conference on Neural Networks (IJCNN).

[32]  Michael S. Bernstein,et al.  Soylent: a word processor with a crowd inside , 2010, UIST.

[33]  Yang Li,et al.  Bootstrapping personal gesture shortcuts with the wisdom of the crowd and handwriting recognition , 2012, CHI.

[34]  Richard M. Young,et al.  Options and Criteria: Elements of design space analysis , 1991 .

[35]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  Lydia B. Chilton,et al.  TurKit: human computation algorithms on mechanical turk , 2010, UIST.

[37]  Jonathan Krause,et al.  Scalable Annotation of Fine-Grained Categories Without Experts , 2017, CHI.

[38]  Henry A. Kautz,et al.  Real-time crowd labeling for deployable activity recognition , 2013, CSCW.

[39]  Fan Yang,et al.  Two Tools are Better Than One: Tool Diversity as a Means of Improving Aggregate Crowd Performance , 2018, IUI.

[40]  Fanglin Chen,et al.  WearMail: On-the-Go Access to Information in Your Email with a Privacy-Preserving Human Computation Workflow , 2017, UIST.

[41]  Walter S. Lasecki,et al.  LegionTools: A Toolkit + UI for Recruiting and Routing Crowds to Synchronous Real-Time Tasks , 2015, UIST.

[42]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1997, EuroCOLT.

[43]  Brendan T. O'Connor,et al.  Cheap and Fast – But is it Good? Evaluating Non-Expert Annotations for Natural Language Tasks , 2008, EMNLP.

[44]  Krzysztof Z. Gajos,et al.  Crowdsourcing step-by-step information extraction to enhance existing how-to videos , 2014, CHI.

[45]  Walter S. Lasecki,et al.  Warping time for more effective real-time crowdsourcing , 2013, CHI.

[46]  Fei-Fei Li,et al.  What's the Point: Semantic Segmentation with Point Supervision , 2015, ECCV.

[47]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[48]  Shane Torbert,et al.  Applied Computer Science , 2012, Springer New York.

[49]  Aniket Kittur,et al.  Crowdlines: Supporting Synthesis of Diverse Information Sources through Crowdsourced Outlines , 2015, HCOMP.

[50]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[51]  Jian Sun,et al.  ScribbleSup: Scribble-Supervised Convolutional Networks for Semantic Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.