What and where: A context-based recommendation system for object insertion

We propose a novel problem revolving around two tasks: (i) given a scene, recommend objects to insert, and (ii) given an object category, retrieve suitable background scenes. A bounding box for the inserted object is predicted in both tasks, which helps downstream applications such as semiautomated advertising and video composition. The major challenge lies in the fact that the target object is neither present nor localized in the input, and furthermore, available datasets only provide scenes with existing objects. To tackle this problem, we build an unsupervised algorithm based on object-level contexts, which explicitly models the joint probability distribution of object categories and bounding boxes using a Gaussian mixture model. Experiments on our own annotated test set demonstrate that our system outperforms existing baselines on all sub-tasks, and does so using a unified framework. Future extensions and applications are suggested.

[1]  Bracha Shapira,et al.  Recommender Systems Handbook , 2015, Springer US.

[2]  Wei Liu,et al.  Learning to Hash for Indexing Big DataVA Survey Thispaperprovidesreaderswithasystematicunderstandingofinsights,pros,andcons of the emerging indexing and search methods for Big Data. , 2016 .

[3]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Zinaida Galkina Graphic designer-client relationships - case study , 2010 .

[5]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Ivan Laptev,et al.  ContextLocNet: Context-Aware Deep Network Models for Weakly Supervised Localization , 2016, ECCV.

[7]  Hideki Todo,et al.  Estimating reflectance and shape of objects from a single cartoon-shaded image , 2016, Computational Visual Media.

[8]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9]  Benjamin Cohen,et al.  Where and Who? Automatic Semantic-Aware Person Composition , 2017, 2018 IEEE Winter Conference on Applications of Computer Vision (WACV).

[10]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[11]  Xin Jin,et al.  Image editing by object-aware optimal boundary searching and mixed-domain composition , 2018, Computational Visual Media.

[12]  Ralph R. Martin,et al.  PatchNet: a patch-based image representation for interactive library-driven image editing , 2013, ACM Trans. Graph..

[13]  Seunghoon Hong,et al.  Learning Hierarchical Semantic Image Manipulation through Structured Representations , 2018, NeurIPS.

[14]  Varsha Negi Recommender System , 2013 .

[15]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[16]  Bolei Zhou,et al.  Learning Deep Features for Discriminative Localization , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Thomas S. Huang,et al.  Generative Image Inpainting with Contextual Attention , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[18]  Wei Liu,et al.  ParseNet: Looking Wider to See Better , 2015, ArXiv.

[19]  Qi Tian,et al.  SIFT Meets CNN: A Decade Survey of Instance Retrieval , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20]  Peter Hall,et al.  A Survey of 3D Indoor Scene Synthesis , 2019, Journal of Computer Science and Technology.

[21]  Ersin Yumer,et al.  ST-GAN: Spatial Transformer Generative Adversarial Networks for Image Compositing , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[22]  Michael S. Bernstein,et al.  Image retrieval using scene graphs , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and Visual Question Answering , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Kaiming He,et al.  Mask R-CNN , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[25]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[26]  Andrea Vedaldi,et al.  Weakly Supervised Deep Detection Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Р Ю Чуйков,et al.  Обнаружение транспортных средств на изображениях загородных шоссе на основе метода Single shot multibox Detector , 2017 .

[28]  Andrea Vedaldi,et al.  Objects in Context , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[29]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[30]  Lei Zhang,et al.  Bottom-Up and Top-Down Attention for Image Captioning and VQA , 2017, ArXiv.

[31]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Danfei Xu,et al.  Scene Graph Generation by Iterative Message Passing , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Vladlen Koltun,et al.  Multi-Scale Context Aggregation by Dilated Convolutions , 2015, ICLR.

[34]  Michael S. Bernstein,et al.  Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations , 2016, International Journal of Computer Vision.

[35]  Jan Kautz,et al.  Context-aware Synthesis and Placement of Object Instances , 2018, NeurIPS.

[36]  Qi Tian,et al.  Recent Advance in Content-based Image Retrieval: A Literature Survey , 2017, ArXiv.

[37]  Wei Liu,et al.  SSD: Single Shot MultiBox Detector , 2015, ECCV.