Social Video Advertisement Replacement and its Evaluation in Convolutional Neural Networks

This paper introduces a method to use deep convolutional neural networks (CNNs) to automatically replace advertisement (AD) photo on social (or self-media) videos and provides the suitable evaluation method to compare different CNNs. An AD photo can replace a picture inside a video. However, if a human being occludes the replaced picture in the original video, the newly pasted AD photo will block the human occluded part. The deep learning algorithm is implemented to segment the human being from the video. The segmented human pixels are then pasted back to the occluded area, so that the AD photo replacement becomes natural and perfect appearance in the video. This process requires the predicted occlusion edge to be closed to the ground truth occlusion edge, so that the AD photo can be occluded naturally. Therefore, this research introduces a curve fitting method to measure the predicted occlusion edge’s error. By using this method, three CNN methods are applied and compared for the AD replacement. They are mask of regions convolutional neural network (Mask RCNN), a recurrent network for video object segmentation (ROVS) and DeeplabV3. The experimental results show the comparative segmentation accuracy of the different models and DeeplabV3 shows the best performance.

[1]  Qiang Wu,et al.  Fast and Accurate Human Detection Using a Cascade of Boosted MS-LBP Features , 2012, IEEE Signal Processing Letters.

[2]  Jean-Marc Odobez,et al.  Leveraging colour segmentation for upper-body detection , 2014, Pattern Recognit..

[3]  Miriam Bellver,et al.  RVOS: End-To-End Recurrent Network for Video Object Segmentation , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Antonio Fernández-Caballero,et al.  Optical flow or image subtraction in human detection from infrared camera on mobile robot , 2010, Robotics Auton. Syst..

[5]  Raimondo Schettini,et al.  Skin segmentation using multiple thresholding , 2006, Electronic Imaging.

[6]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Li Li,et al.  Moving human detection algorithm based on Gaussian mixture model , 2010, Proceedings of the 29th Chinese Control Conference.

[9]  Michael Harville,et al.  A Framework for High-Level Feedback to Adaptive, Per-Pixel, Mixture-of-Gaussian Background Models , 2002, ECCV.

[10]  George Papandreou,et al.  Rethinking Atrous Convolution for Semantic Image Segmentation , 2017, ArXiv.

[11]  Sergio Escalera,et al.  Graph cuts optimization for multi-limb human segmentation in depth maps , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  Luc Van Gool,et al.  The 2017 DAVIS Challenge on Video Object Segmentation , 2017, ArXiv.

[13]  Dit-Yan Yeung,et al.  Convolutional LSTM Network: A Machine Learning Approach for Precipitation Nowcasting , 2015, NIPS.

[14]  Roberto Cipolla,et al.  SegNet: A Deep Convolutional Encoder-Decoder Architecture for Image Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Nan Wang,et al.  Who Blocks Who: Simultaneous clothing segmentation for grouping images , 2011, 2011 International Conference on Computer Vision.

[16]  Alan J. Miller,et al.  Numerical Methods of Curve Fitting. , 1961 .

[17]  Bir Bhanu,et al.  Fusion of color and infrared video for moving human detection , 2007, Pattern Recognit..

[18]  ByoungChul Ko,et al.  View-invariant, partially occluded human detection in still images using part bases and random forest , 2015 .

[19]  Juan José Pantrigo,et al.  Comparing Color and Texture-Based Algorithms for Human Skin Detection , 2008, ICEIS.

[20]  Larry S. Davis,et al.  Human detection using partial least squares analysis , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[21]  Tianli Yu,et al.  Kernelized structural SVM learning for supervised object segmentation , 2011, CVPR 2011.

[22]  Stephan Wong,et al.  Adaptive Gaussian Mixture Model for Skin Color Segmentation , 2008 .

[23]  Luc Van Gool,et al.  One-Shot Video Object Segmentation , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Jenn-Jier James Lien,et al.  AdaBoost Learning for Human Detection Based on Histograms of Oriented Gradients , 2007, ACCV.

[25]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[26]  Ki-Sang Hong,et al.  Color-texture segmentation using unsupervised graph cuts , 2009, Pattern Recognit..

[27]  Trevor Darrell,et al.  Rich Feature Hierarchies for Accurate Object Detection and Semantic Segmentation , 2013, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[28]  Tao Mei,et al.  Learning Deep Spatio-Temporal Dependence for Semantic Video Segmentation , 2018, IEEE Transactions on Multimedia.

[29]  Silvio Savarese,et al.  Generalized Intersection Over Union: A Metric and a Loss for Bounding Box Regression , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Kidiyo Kpalma,et al.  Human Detection Using HOG-SVM, Mixture of Gaussian and Background Contours Subtraction , 2017, 2017 13th International Conference on Signal-Image Technology & Internet-Based Systems (SITIS).

[31]  Tommy W. S. Chow,et al.  Object-Level Video Advertising: An Optimization Framework , 2017, IEEE Transactions on Industrial Informatics.

[32]  Matti Pietikäinen,et al.  Recognition of human actions using texture descriptors , 2011, Machine Vision and Applications.

[33]  Tiancang Du,et al.  Improved Adaboost Face Detection , 2010, 2010 International Conference on Measuring Technology and Mechatronics Automation.

[34]  Daniel Petrisor,et al.  Algorithm for face and eye detection using colour segmentation and invariant features , 2011, 2011 34th International Conference on Telecommunications and Signal Processing (TSP).

[35]  Jianpeng Zhou,et al.  Real Time Robust Human Detection and Tracking System , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05) - Workshops.

[36]  Xiaoxiao Li,et al.  Not All Pixels Are Equal: Difficulty-Aware Semantic Segmentation via Deep Layer Cascade , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Ling Li,et al.  An optimization framework of video advertising: using deep learning algorithm based on global image information , 2018, Cluster Computing.

[38]  Ayoub Al-Hamadi,et al.  A Hybrid Cascade Approach for Human Skin Segmentation , 2016 .

[39]  Luc Van Gool,et al.  A Benchmark Dataset and Evaluation Methodology for Video Object Segmentation , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Margrit Betke,et al.  Comparing random forest approaches to segmenting and classifying gestures , 2017, Image Vis. Comput..

[41]  Yi Yang,et al.  SG-One: Similarity Guidance Network for One-Shot Semantic Segmentation , 2018, IEEE Transactions on Cybernetics.

[42]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[43]  Matti Pietikäinen,et al.  Face spoofing detection from single images using texture and local shape analysis , 2012, IET Biom..

[44]  E. Granum,et al.  Skin colour detection under changing lighting conditions , 1999 .

[45]  George Papandreou,et al.  Encoder-Decoder with Atrous Separable Convolution for Semantic Image Segmentation , 2018, ECCV.

[46]  A. Rosenfeld,et al.  Background Subtraction Algorithm Based Human Motion Detection , 2013 .

[47]  Iasonas Kokkinos,et al.  DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Jitendra Malik,et al.  Learning to detect natural image boundaries using local brightness, color, and texture cues , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[49]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[50]  Wei Gao,et al.  Adaptive Contour Features in oriented granular space for human detection and segmentation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Matti Pietikäinen,et al.  Human Activity Recognition Using a Dynamic Texture Based Method , 2008, BMVC.

[52]  Larry S. Davis,et al.  A Pose-Invariant Descriptor for Human Detection and Segmentation , 2008, ECCV.

[53]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[54]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[55]  Margrit Betke,et al.  A random forest approach to segmenting and classifying gestures , 2015, 2015 11th IEEE International Conference and Workshops on Automatic Face and Gesture Recognition (FG).

[56]  Trevor Darrell,et al.  Fully Convolutional Networks for Semantic Segmentation , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[57]  A. Vadivel,et al.  Multiple human object tracking using background subtraction and shadow removal techniques , 2010, 2010 International Conference on Signal and Image Processing.

[58]  Wei-Yun Yau,et al.  A Bayesian framework for robust human detection and occlusion handling human shape model , 2004, Proceedings of the 17th International Conference on Pattern Recognition, 2004. ICPR 2004..

[59]  Tae-Kyun Kim,et al.  Fast Pedestrian Detection by Cascaded Random Forest with Dominant Orientation Templates , 2012, BMVC.

[60]  Jean-Marc Odobez,et al.  Multi-Layer Background Subtraction Based on Color and Texture , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[61]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[62]  Matti Pietikäinen,et al.  Face liveness detection using dynamic texture , 2014, EURASIP J. Image Video Process..

[63]  Roland Siegwart,et al.  Human detection using multimodal and multidimensional features , 2008, 2008 IEEE International Conference on Robotics and Automation.