Crowd-Guided Ensembles: How Can We Choreograph Crowd Workers for Video Segmentation?

In this work, we propose two ensemble methods leveraging a crowd workforce to improve video annotation, with a focus on video object segmentation. Their shared principle is that while individual candidate results may likely be insufficient, they often complement each other so that they can be combined into something better than any of the individual results---the very spirit of collaborative working. For one, we extend a standard polygon-drawing interface to allow workers to annotate negative space, and combine the work of multiple workers instead of relying on a single best one as commonly done in crowdsourced image segmentation. For the other, we present a method to combine multiple automatic propagation algorithms with the help of the crowd. Such combination requires an understanding of where the algorithms fail, which we gather using a novel coarse scribble video annotation task. We evaluate our ensemble methods, discuss our design choices for them, and make our web-based crowdsourcing tools and results publicly available.

[1]  Noah Snavely,et al.  Intrinsic images in the wild , 2014, ACM Trans. Graph..

[2]  Yao Lu,et al.  Coherent Parametric Contours for Interactive Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Margrit Betke,et al.  Investigating the Influence of Data Familiarity to Improve the Design of a Crowdsourcing Image Annotation System , 2016, HCOMP.

[4]  Alexander Sorkine-Hornung,et al.  Bilateral Space Video Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Noah Snavely,et al.  OpenSurfaces , 2013, ACM Trans. Graph..

[6]  Jitendra Malik,et al.  Hypercolumns for object segmentation and fine-grained localization , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Michael F. Cohen,et al.  An iterative optimization approach for unified image segmentation and matting , 2005, Tenth IEEE International Conference on Computer Vision (ICCV'05) Volume 1.

[8]  Pietro Perona,et al.  Active Annotation Translation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[9]  Andrea Vedaldi,et al.  MatConvNet: Convolutional Neural Networks for MATLAB , 2014, ACM Multimedia.

[10]  Jon Froehlich,et al.  Tohme: detecting curb ramps in google street view using crowdsourcing, computer vision, and machine learning , 2014, UIST.

[11]  Sanja Fidler,et al.  The Role of Context for Object Detection and Semantic Segmentation in the Wild , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Michael S. Bernstein,et al.  Scalable multi-label annotation , 2014, CHI.

[13]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Thomas G. Dietterich Multiple Classifier Systems , 2000, Lecture Notes in Computer Science.

[15]  Aljoscha Smolic,et al.  Interactive high-quality green-screen keying via color unmixing , 2017, TOGS.

[16]  Deva Ramanan,et al.  Efficiently Scaling up Crowdsourced Video Annotation , 2012, International Journal of Computer Vision.

[17]  Michael J. Black,et al.  Video Segmentation via Object Flow , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[18]  Devi Parikh,et al.  Interactively Guiding Semi-Supervised Clustering via Attribute-Based Explanations , 2014, ECCV.

[19]  Aljoscha Smolic,et al.  Unmixing-Based Soft Color Segmentation for Image Manipulation , 2017, TOGS.

[20]  Pietro Perona,et al.  Tropel: Crowdsourcing Detectors with Minimal Training , 2015, HCOMP.

[21]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Ali Farhadi,et al.  Much Ado About Time: Exhaustive Annotation of Temporal Data , 2016, HCOMP.

[23]  Bernt Schiele,et al.  Learning Video Object Segmentation from Static Images , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Huchuan Lu,et al.  Interactive Video Segmentation via Local Appearance Model , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[25]  Ali Farhadi,et al.  Predicting Failures of Vision Systems , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26]  Chieko Asakawa,et al.  VizLens: A Robust and Interactive Screen Reader for Interfaces in the Real World , 2016, UIST.

[27]  Michael S. Bernstein,et al.  The future of crowd work , 2013, CSCW.

[28]  Kristen Grauman,et al.  Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[29]  Antonio Torralba,et al.  LabelMe video: Building a video database with human annotations , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[30]  Gierad Laput,et al.  Zensors: Adaptive, Rapidly Deployable, Human-Intelligent Sensor Feeds , 2015, CHI.

[31]  Marc Pollefeys,et al.  Designing Effective Inter-Pixel Information Flow for Natural Image Matting , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Wenbin Li,et al.  Roto++ , 2016, ACM Trans. Graph..

[33]  James Hays,et al.  COCO Attributes: Attributes for People, Animals, and Objects , 2016, ECCV.

[34]  Guillermo Sapiro,et al.  Video SnapCut: robust video object cutout using localized classifiers , 2009, SIGGRAPH 2009.

[35]  Amaia Salvador,et al.  Quality control in crowdsourced object segmentation , 2015, 2015 IEEE International Conference on Image Processing (ICIP).

[36]  Peter V. Gehler,et al.  Video Propagation Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Scott Cohen,et al.  LIVEcut: Learning-based interactive video segmentation by evaluation of multiple propagated cues , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[38]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  David Salesin,et al.  Keyframe-based tracking for rotoscoping and animation , 2004, ACM Trans. Graph..

[40]  Kristen Grauman,et al.  Large-scale live active learning: Training object detectors with crawled data and crowds , 2011, CVPR.

[41]  Jitendra Malik,et al.  Large Displacement Optical Flow: Descriptor Matching in Variational Motion Estimation , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[42]  A. P. Dawid,et al.  Maximum Likelihood Estimation of Observer Error‐Rates Using the EM Algorithm , 1979 .

[43]  Peng Dai,et al.  Decision-Theoretic Control of Crowd-Sourced Workflows , 2010, AAAI.

[44]  Kristen Grauman,et al.  CrowdVerge: Predicting If People Will Agree on the Answer to a Visual Question , 2017, CHI.

[45]  Noah Snavely,et al.  Material recognition in the wild with the Materials in Context Database , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.