Efficiently Detecting Plausible Locations for Object Placement Using Masked Convolutions

Being able to insert new objects into images is an important problem for both artistic image editing and for data augmentation. For a successful image manipulation, the plausible placement and the blending of the new objects in the image are critical. In this paper, we propose a fast method for the automatic selection of plausible locations for object insertion into images. Like previous work, we approach the object placement problem as a detection problem – given a bounding box, we evaluate whether an object is present inside the box based only on the neighborhood of the box. However, previous work requires a forward pass for each potential bounding box location. We propose instead to make use of masked convolutions to compute featuremaps for left, right, top and bottom contexts just once per image. Combining these features in such a way that no information from inside a bounding box is propagated to the final classifier allows the model to evaluate a grid of proposals on the featuremaps rather than on the image, speeding up inference dramatically. We validate that our model can generate plausible placements using experiments on the COCO dataset and on a user study. Our method trades off speed for performance, as compared to a patch based approach.

[1]  Heiga Zen,et al.  WaveNet: A Generative Model for Raw Audio , 2016, SSW.

[2]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3]  Derek Hoiem,et al.  Indoor Segmentation and Support Inference from RGBD Images , 2012, ECCV.

[4]  Yun Jiang,et al.  Learning to place new objects in a scene , 2012, Int. J. Robotics Res..

[5]  Matthew A. Brown,et al.  Learning to Segment via Cut-and-Paste , 2018, ECCV.

[6]  Cordelia Schmid,et al.  On the Importance of Visual Context for Data Augmentation in Scene Understanding , 2018, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[7]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Kalyan Sunkavalli,et al.  Compositing-Aware Image Search , 2018, ECCV.

[9]  拓海 杉山,et al.  “Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks”の学習報告 , 2017 .

[10]  Martial Hebert,et al.  Cut, Paste and Learn: Surprisingly Easy Synthesis for Instance Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[11]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[12]  Christian Ledig,et al.  Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[13]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[14]  Ting-Chun Wang,et al.  Image Inpainting for Irregular Holes Using Partial Convolutions , 2018, ECCV.

[15]  Alex Graves,et al.  Conditional Image Generation with PixelCNN Decoders , 2016, NIPS.

[16]  Sebastian Ramos,et al.  The Cityscapes Dataset for Semantic Urban Scene Understanding , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Yu Cheng,et al.  Pedestrian-Synthesis-GAN: Generating Pedestrian Data in Real Scene and Beyond , 2018, ArXiv.

[18]  Alexei A. Efros,et al.  Image-to-Image Translation with Conditional Adversarial Networks , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Ersin Yumer,et al.  ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[20]  Dinesh Manocha,et al.  MixedPeds: Pedestrian Detection in Unannotated Videos Using Synthetically Generated Human-Agents for Training , 2018, AAAI.

[21]  James M. Rehg,et al.  Perceiving clutter and surfaces for object placement in indoor environments , 2010, 2010 10th IEEE-RAS International Conference on Humanoid Robots.

[22]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.