论文信息 - Sitcom-star-based clothing retrieval for video advertising: a deep learning framework

Sitcom-star-based clothing retrieval for video advertising: a deep learning framework

This paper presents a novel learning-based framework for video content-based advertising, DeepLink, which aims at linking Sitcom-stars and online shops with clothing retrieval by using state-of-the-art deep convolutional neural networks (CNNs). Specifically, several deep CNN models are adopted for composing multiple sub-modules in DeepLink, including human-body detection, human pose selection, face verification, clothing detection and retrieval from advertisements (ads) pool that is constructed by clothing images crawled from real-world online shops. For clothing detection and retrieval from ad-images, we firstly transfer the state-of-the-art deep CNN models to our data domain, and then train corresponding models based on our constructed large-scale clothes datasets. Extensive experimental results demonstrate the feasibility and efficacy of our proposed clothing-based video advertising system.

[1] Rob Fergus,et al. Visualizing and Understanding Convolutional Networks , 2013, ECCV.

[2] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[3] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[4] Guigang Zhang,et al. Deep Learning , 2016, Int. J. Semantic Comput..

[5] James Philbin,et al. FaceNet: A unified embedding for face recognition and clustering , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Subrahmanyam Murala,et al. Local Tetra Patterns: A New Feature Descriptor for Content-Based Image Retrieval , 2012, IEEE Transactions on Image Processing.

[7] Matti Pietikäinen,et al. Multiresolution Gray-Scale and Rotation Invariant Texture Classification with Local Binary Patterns , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[8] Iasonas Kokkinos,et al. DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[9] Koen E. A. van de Sande,et al. Selective Search for Object Recognition , 2013, International Journal of Computer Vision.

[10] Qiang Chen,et al. Cross-Domain Image Retrieval with a Dual Attribute-Aware Ranking Network , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[11] Marwan Mattar,et al. Labeled Faces in the Wild: A Database forStudying Face Recognition in Unconstrained Environments , 2008 .

[12] Jorge García Duque,et al. Bringing Content Awareness to Web-Based IDTV Advertising , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[13] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[14] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Dumitru Erhan,et al. Scalable, High-Quality Object Detection , 2014, ArXiv.

[16] Geoffrey E. Hinton,et al. ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[17] Yoshua Bengio,et al. Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[18] Tal Hassner,et al. Face recognition in unconstrained videos with matched background similarity , 2011, CVPR 2011.

[19] Jian Sun,et al. Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition , 2014, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[20] Xiaogang Wang,et al. DeepFashion: Powering Robust Clothes Recognition and Retrieval with Rich Annotations , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[21] Gregory Shakhnarovich,et al. FractalNet: Ultra-Deep Neural Networks without Residuals , 2016, ICLR.

[22] Sergey Ioffe,et al. Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[23] Harish Katti,et al. CAVVA: Computational Affective Video-in-Video Advertising , 2014, IEEE Transactions on Multimedia.

[24] Tao Mei,et al. VideoSense: A Contextual In-Video Advertising System , 2009, IEEE Transactions on Circuits and Systems for Video Technology.

[25] Trevor Darrell,et al. Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[26] Luc Van Gool,et al. The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[27] Christian Szegedy,et al. DeepPose: Human Pose Estimation via Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29] Xiaogang Wang,et al. DeepID3: Face Recognition with Very Deep Neural Networks , 2015, ArXiv.

[30] Xiaogang Wang,et al. Deep Convolutional Network Cascade for Facial Point Detection , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[31] Tommy W. S. Chow,et al. Object-Level Video Advertising: An Optimization Framework , 2017, IEEE Transactions on Industrial Informatics.

[32] Hanjiang Lai,et al. Supervised Hashing for Image Retrieval via Image Representation Learning , 2014, AAAI.

[33] Trevor Darrell,et al. Caffe: Convolutional Architecture for Fast Feature Embedding , 2014, ACM Multimedia.

[34] Fei-Fei Li,et al. ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[35] Xiaogang Wang,et al. Deep Learning Face Representation by Joint Identification-Verification , 2014, NIPS.

[36] Sergey Ioffe,et al. Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning , 2016, AAAI.

[37] Ross B. Girshick,et al. Fast R-CNN , 2015, 1504.08083.

[38] Ali Farhadi,et al. You Only Look Once: Unified, Real-Time Object Detection , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39] Andrew W. Fitzgibbon,et al. Real-time human pose recognition in parts from single depth images , 2011, CVPR 2011.

[40] Shengcai Liao,et al. Learning Face Representation from Scratch , 2014, ArXiv.

[41] Sergey Ioffe,et al. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift , 2015, ICML.

[42] 한보형,et al. Learning Deconvolution Network for Semantic Segmentation , 2015 .

[43] Kaiming He,et al. Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44] Changsheng Xu,et al. Real time advertisement insertion in baseball video based on advertisement effect , 2005, MULTIMEDIA '05.

[45] Qi Tian,et al. Interactive ads recommendation with contextual search on product topic space , 2011, Multimedia Tools and Applications.

[46] David A. McAllester,et al. Object Detection with Discriminatively Trained Part Based Models , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[47] Dumitru Erhan,et al. Scalable Object Detection Using Deep Neural Networks , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[48] Yong Feng,et al. Multi-label learning with label relevance in advertising video , 2016, Neurocomputing.

[49] Jiwen Lu,et al. Deep hashing for compact binary codes learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50] Fei-Fei Li,et al. Deep visual-semantic alignments for generating image descriptions , 2015, CVPR.

[51] Jen-Hao Hsiao,et al. Deep learning of binary hash codes for fast image retrieval , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition Workshops (CVPRW).

[52] Xiaogang Wang,et al. Deeply learned face representations are sparse, selective, and robust , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[53] Sean Hughes,et al. Clustering by Fast Search and Find of Density Peaks , 2016 .

[54] S. Shan,et al. VIPLFaceNet: an open source deep face recognition SDK , 2016, Frontiers of Computer Science.

[55] Bernt Schiele,et al. 2D Human Pose Estimation: New Benchmark and State of the Art Analysis , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[56] José Juan Pazos-Arias,et al. Cloud-Based Personalization of New Advertising and e-Commerce Models for Video Consumption , 2013, Comput. J..

[57] Geoffrey E. Hinton,et al. Reducing the Dimensionality of Data with Neural Networks , 2006, Science.

[58] Svetlana Lazebnik,et al. Where to Buy It: Matching Street Clothing Photos in Online Shops , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[59] Xiaogang Wang,et al. Deep Learning Face Representation from Predicting 10,000 Classes , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[60] Wei Liu,et al. SSD: Single Shot MultiBox Detector , 2015, ECCV.

[61] Xiaoyang Tan,et al. Enhanced Local Texture Feature Sets for Face Recognition Under Difficult Lighting Conditions , 2007, IEEE Transactions on Image Processing.