Unconstrained Foreground Object Search

Many people search for foreground objects to use when editing images. While existing methods can retrieve candidates to aid in this, they are constrained to returning objects that belong to a pre-specified semantic class. We instead propose a novel problem of unconstrained foreground object (UFO) search and introduce a solution that supports efficient search by encoding the background image in the same latent space as the candidate foreground objects. A key contribution of our work is a cost-free, scalable approach for creating a large-scale training dataset with a variety of foreground objects of differing semantic categories per image location. Quantitative and human-perception experiments with two diverse datasets demonstrate the advantage of our UFO search solution over related baselines.

[1]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[2]  Jeff Johnson,et al.  Billion-Scale Similarity Search with GPUs , 2017, IEEE Transactions on Big Data.

[3]  Jo Yew Tham,et al.  Learning Attribute Representations with Localization for Flexible Fashion Search , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Antonio Torralba,et al.  Contextual Priming for Object Detection , 2003, International Journal of Computer Vision.

[5]  Ming-Hsuan Yang,et al.  Deep Image Harmonization , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Scott Cohen,et al.  Guided Image Inpainting: Replacing an Image Region by Pulling Content From Another Image , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Sylvain Paris,et al.  Error-Tolerant Image Compositing , 2010, ECCV.

[8]  S. Ullman,et al.  Spatial Context in Recognition , 1996, Perception.

[9]  Alexei A. Efros,et al.  Learning a Discriminative Model for the Perception of Realism in Composite Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[10]  Benjamin Cohen,et al.  Where and Who? Automatic Semantic-Aware Person Composition , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[11]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[12]  Luca Antiga,et al.  Automatic differentiation in PyTorch , 2017 .

[13]  Abhinav Gupta,et al.  Unsupervised Learning of Visual Representations Using Videos , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[14]  Alexei A. Efros,et al.  Photo clip art , 2007, ACM Trans. Graph..

[15]  Ersin Yumer,et al.  ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[16]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[17]  Andrew Zisserman,et al.  Get Out of my Picture! Internet-based Inpainting , 2009, BMVC.

[18]  Hao Li,et al.  High-Resolution Image Inpainting Using Multi-scale Neural Patch Synthesis , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Trevor Darrell,et al.  Compositional GAN: Learning Image-Conditional Binary Composition , 2018, International Journal of Computer Vision.

[20]  Kalyan Sunkavalli,et al.  Compositing-Aware Image Search , 2018, ECCV.

[21]  Alexei A. Efros,et al.  Scene completion using millions of photographs , 2008, Commun. ACM.

[22]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Eli Shechtman,et al.  PatchMatch: a randomized correspondence algorithm for structural image editing , 2009, ACM Trans. Graph..

[24]  Alexei A. Efros,et al.  An empirical study of context in object detection , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[26]  Kavita Bala,et al.  Inside-Outside Net: Detecting Objects in Context with Skip Pooling and Recurrent Neural Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Hiroshi Ishikawa,et al.  Globally and locally consistent image completion , 2017, ACM Trans. Graph..

[28]  Trevor Darrell,et al.  Compositional GAN: Learning Conditional Image Composition , 2018, ArXiv.

[29]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[30]  Ming-Hsuan Yang,et al.  Context Driven Scene Parsing with Attention to Rare Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[31]  José A. Rodríguez-Serrano,et al.  Data-Driven Detection of Prominent Objects , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  David W. Jacobs,et al.  Seeing What is Not There: Learning Context to Determine Where Objects are Missing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Zhe Zhu,et al.  Faithful Completion of Images of Scenic Landmarks Using Internet Images , 2016, IEEE Transactions on Visualization and Computer Graphics.

[34]  Wojciech Matusik,et al.  Multi-scale image harmonization , 2010, SIGGRAPH 2010.

[35]  Patrick Pérez,et al.  Poisson image editing , 2003, ACM Trans. Graph..

[36]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Thomas M. Strat,et al.  Context-Based Vision: Recognizing Objects Using Information from Both 2D and 3D Imagery , 1991, IEEE Trans. Pattern Anal. Mach. Intell..