Predicting Foreground Object Ambiguity and Efficiently Crowdsourcing the Segmentation(s)

We propose the ambiguity problem for the foreground object segmentation task and motivate the importance of estimating and accounting for this ambiguity when designing vision systems. Specifically, we distinguish between images which lead multiple annotators to segment different foreground objects (ambiguous) versus minor inter-annotator differences of the same object. Taking images from eight widely used datasets, we crowdsource labeling the images as “ambiguous” or “not ambiguous” to segment in order to construct a new dataset we call STATIC. Using STATIC, we develop a system that automatically predicts which images are ambiguous. Experiments demonstrate the advantage of our prediction system over existing saliency-based methods on images from vision benchmarks and images taken by blind people who are trying to recognize objects in their environment. Finally, we introduce a crowdsourcing system to achieve cost savings for collecting the diversity of all valid “ground truth” foreground object segmentations by collecting extra segmentations only when ambiguity is expected. Experiments show our system eliminates up to 47% of human effort compared to existing crowdsourcing methods with no loss in capturing the diversity of ground truths.

[1]  Sotirios A. Tsaftaris,et al.  Medical Image Computing and Computer Assisted Intervention , 2017 .

[2]  Adriana Kovashka,et al.  Discovering Attribute Shades of Meaning with the Crowd , 2014, International Journal of Computer Vision.

[3]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Jitendra Malik,et al.  A database of human segmented natural images and its application to evaluating segmentation algorithms and measuring ecological statistics , 2001, Proceedings Eighth IEEE International Conference on Computer Vision. ICCV 2001.

[5]  Andrew Blake,et al.  "GrabCut" , 2004, ACM Trans. Graph..

[6]  Kristen Grauman,et al.  Cost-Sensitive Active Visual Category Learning , 2010, International Journal of Computer Vision.

[7]  Eric Gilbert,et al.  What if we ask a different question?: social inferences create product ratings faster , 2014, CHI.

[8]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[9]  Anthony P. Reeves,et al.  A comparison of ground truth estimation methods , 2010, International Journal of Computer Assisted Radiology and Surgery.

[10]  Aaron Steinfeld,et al.  An Assisted Photography Framework to Help Visually Impaired Users Properly Aim a Camera , 2014, TCHI.

[11]  HuXiaowei,et al.  Salient Object Detection , 2017 .

[12]  Ieee Xplore,et al.  IEEE Transactions on Pattern Analysis and Machine Intelligence Information for Authors , 2022, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Ali Borji,et al.  Salient Object Detection: A Benchmark , 2015, IEEE Transactions on Image Processing.

[14]  Jian Sun,et al.  Salient object detection by composition , 2011, 2011 International Conference on Computer Vision.

[15]  Antonio Torralba,et al.  Context-based vision system for place and object recognition , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[16]  Sabine Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[17]  Petra Perner,et al.  Machine Learning and Data Mining in Pattern Recognition , 2009, Lecture Notes in Computer Science.

[18]  Guillermo Sapiro,et al.  Geodesic Active Contours , 1995, International Journal of Computer Vision.

[19]  Avrim Blum,et al.  Veritas: Combining Expert Opinions without Labeled Data , 2008, 2008 20th IEEE International Conference on Tools with Artificial Intelligence.

[20]  Matthew Lease,et al.  SQUARE: A Benchmark for Research on Computing Crowd Consensus , 2013, HCOMP.

[21]  Erin Brady,et al.  Visual challenges in the everyday lives of blind people , 2013, CHI.

[22]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[23]  Lihi Zelnik-Manor,et al.  How to Evaluate Foreground Maps , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[24]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[25]  Pietro Perona,et al.  The Multidimensional Wisdom of Crowds , 2010, NIPS.

[26]  Pietro Perona,et al.  Online crowdsourcing: Rating annotators and obtaining cost-effective labels , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition - Workshops.

[27]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[28]  Jonathan T. Barron,et al.  Multiscale Combinatorial Grouping , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Margrit Betke,et al.  Salient Object Subitizing , 2015, CVPR.

[30]  W. Clem Karl,et al.  A Real-Time Algorithm for the Approximation of Level-Set-Based Curve Evolution , 2008, IEEE Transactions on Image Processing.

[31]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[32]  Jeffrey P. Bigham,et al.  Supporting blind photography , 2011, ASSETS.

[33]  Miguel A. García-Pérez,et al.  Visual inhomogeneity and eye movements in multistable perception , 1989, Perception & psychophysics.

[34]  Ronen Basri,et al.  Image Segmentation by Probabilistic Bottom-Up Aggregation and Cue Integration , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[35]  Javier R. Movellan,et al.  Whose Vote Should Count More: Optimal Integration of Labels from Labelers of Unknown Expertise , 2009, NIPS.

[36]  C. Lawrence Zitnick,et al.  Fast Edge Detection Using Structured Forests , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Ali Borji,et al.  What stands out in a scene? A study of human explicit saliency judgment , 2013, Vision Research.

[38]  Bill Triggs,et al.  Histograms of oriented gradients for human detection , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[39]  Alexander C. Berg,et al.  Finding iconic images , 2009, 2009 IEEE Computer Society Conference on Computer Vision and Pattern Recognition Workshops.

[40]  Karl Stratos,et al.  Understanding and predicting importance in images , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Kristen Grauman,et al.  Predicting Sufficient Annotation Strength for Interactive Foreground Segmentation , 2013, 2013 IEEE International Conference on Computer Vision.

[42]  Timo Kohlberger,et al.  Evaluating Segmentation Error without Ground Truth , 2012, MICCAI.

[43]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Linda G. Shapiro,et al.  Estimating Image Segmentation Difficulty , 2011, MLDM.

[45]  Jingdong Wang,et al.  Salient Object Detection: A Discriminative Regional Feature Integration Approach , 2013, International Journal of Computer Vision.

[46]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[47]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[48]  Trevor Darrell,et al.  Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[49]  Shi-Min Hu,et al.  Sketch2Photo: internet image montage , 2009, ACM Trans. Graph..

[50]  Adriana Kovashka,et al.  WhittleSearch: Interactive Image Search with Relative Attribute Feedback , 2015, International Journal of Computer Vision.

[51]  David A. Leopold,et al.  Binocular rivalry and the illusion of monocular vision , 2004 .

[52]  James J. Little,et al.  Curious George: An attentive semantic robot , 2008, Robotics Auton. Syst..

[53]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[54]  Margrit Betke,et al.  Investigating the Influence of Data Familiarity to Improve the Design of a Crowdsourcing Image Annotation System , 2016, HCOMP.

[55]  Aaron D. Shaw,et al.  Designing incentives for inexpert human raters , 2011, CSCW.

[56]  Devi Parikh,et al.  Image specificity , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[58]  C.-C. Jay Kuo,et al.  Measuring and Predicting Tag Importance for Image Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[59]  Jeffrey P. Bigham,et al.  Real time object scanning using a mobile phone and cloud-based visual search engine , 2013, ASSETS.

[60]  William M. Wells,et al.  Simultaneous truth and performance level estimation (STAPLE): an algorithm for the validation of image segmentation , 2004, IEEE Transactions on Medical Imaging.

[61]  Rob Miller,et al.  VizWiz: nearly real-time answers to visual questions , 2010, UIST.

[62]  Shimon Ullman,et al.  Combined Top-Down/Bottom-Up Segmentation , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[63]  Margrit Betke,et al.  Salient Object Subitizing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  S. Süsstrunk,et al.  Frequency-tuned salient region detection , 2009, CVPR 2009.

[65]  Vladimir Kolmogorov,et al.  "GrabCut": interactive foreground extraction using iterated graph cuts , 2004, ACM Trans. Graph..

[66]  Andrew Blake,et al.  Geodesic star convexity for interactive image segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.