Horizontal-to-Vertical Video Conversion

At this blooming age of social media and mobile platform, mass consumers are migrating from horizontal video to vertical contents delivered on hand-held devices. Accordingly, revitalizing the exposure of horizontal video becomes vital and urgent, which is hereby settled, for the first time, with our automated horizontal-to-vertical (abbreviated as H2V) video conversion framework. Essentially, the H2V framework performs subjectpreserving video cropping instantiated in the proposed RankSS module. Rank-SS incorporates object detection to discover candidate subjects, from which we select the primary subjectto-preserve leveraging location, appearance, and salient cues in a convolutional neural network. Besides converting horizontal videos to vertically by cropping around the selected subject, automatic shot detection and multi-object tracking are also integrated in the H2V framework to accommodate long and complex videos. In addition, for the development of H2V systems, we publicize an H2V-142K dataset containing 125 videos (132K frames) and 9,500 cover images annotated with primary subject bounding boxes. On H2V-142K and public object detection datasets, our method demonstrates superior subject selection accuracy comparing to related solutions. Beyond that, our H2V framework is also industrially-deployed hosting millions of daily active users and exhibits favorable H2V conversion performance. Upon publicizing this dataset as well as our approach, we wish to pave the way for more horizontal-to-vertical video conversion solutions to come.

[1]  Paul Zarchan,et al.  Fundamentals of Kalman Filtering: A Practical Approach , 2001 .

[2]  Kristen Grauman,et al.  FusionSeg: Learning to Combine Motion and Appearance for Fully Automatic Segmentation of Generic Objects in Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Shi-Min Hu,et al.  Global contrast based salient region detection , 2011, CVPR 2011.

[4]  Xiao Liu,et al.  Probabilistic Graphlet Transfer for Photo Cropping , 2013, IEEE Transactions on Image Processing.

[5]  Tie Liu,et al.  DeepVS: A Deep Learning Based Video Saliency Prediction Approach , 2018, ECCV.

[6]  Lei Zhang,et al.  Reliable and Efficient Image Cropping: A Grid Anchor Based Approach , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Jay Martin Tenenbaum,et al.  Accommodation in computer vision , 1971 .

[8]  Yuan Xie,et al.  Instance-Level Salient Object Segmentation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Nanning Zheng,et al.  Learning to Detect a Salient Object , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[10]  Ming-Hsuan Yang,et al.  PiCANet: Learning Pixel-Wise Contextual Attention for Saliency Detection , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[11]  Thomas Brox,et al.  Object segmentation in video: A hierarchical variational approach for turning point trajectories into dense regions , 2011, 2011 International Conference on Computer Vision.

[12]  Ross B. Girshick,et al.  Fast R-CNN , 2015, 1504.08083.

[13]  Natasha Gelfand,et al.  A survey of image retargeting techniques , 2010, Optical Engineering + Applications.

[14]  Ali Borji,et al.  Saliency Prediction in the Deep Learning Era: Successes and Limitations , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[15]  Qi Zhao,et al.  SALICON: Saliency in Context , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[16]  Zhengxing Sun,et al.  Co-saliency Detection Based on Hierarchical Consistency , 2019, ACM Multimedia.

[17]  Ross B. Girshick,et al.  Mask R-CNN , 2017, 1703.06870.

[18]  Christof Koch,et al.  A Model of Saliency-Based Visual Attention for Rapid Scene Analysis , 2009 .

[19]  Dingwen Zhang,et al.  Employing Deep Part-Object Relationships for Salient Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Wolfgang Effelsberg,et al.  FSCAV: fast seam carving for size adaptation of videos , 2009, ACM Multimedia.

[21]  Jian Yang,et al.  DSFD: Dual Shot Face Detector , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Bingbing Ni,et al.  Learning to photograph , 2010, ACM Multimedia.

[23]  Jakub Lokoc,et al.  TransNet: A deep network for fast detection of common shot transitions , 2019, ArXiv.

[24]  James M. Rehg,et al.  The Secrets of Salient Object Segmentation , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[25]  Junwei Han,et al.  DHSNet: Deep Hierarchical Saliency Network for Salient Object Detection , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Sanyuan Zhao,et al.  Multi-scale Capsule Attention-Based Salient Object Detection with Multi-crossed Layer Connections , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[27]  Zheng Wang,et al.  Ranking Video Salient Object Detection , 2019, ACM Multimedia.

[28]  Weijie Zhao,et al.  Image Cropping with Composition and Saliency Aware Aesthetic Score Map , 2020, AAAI.

[29]  Rongrong Ji,et al.  Towards perceptual video cropping with curve fitting , 2014, Multimedia Tools and Applications.

[30]  Yuan Xie,et al.  Flow Guided Recurrent Neural Encoder for Video Salient Object Detection , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[31]  Luca Bertinetto,et al.  Anchor Diffusion for Unsupervised Video Object Segmentation , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[32]  Gregory N. Hullender,et al.  Learning to rank using gradient descent , 2005, ICML.

[33]  Aykut Erdem,et al.  Spatio-Temporal Saliency Networks for Dynamic Saliency Prediction , 2016, IEEE Transactions on Multimedia.

[34]  Ali Borji,et al.  Understanding and Visualizing Deep Visual Saliency Models , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Xiaogang Wang,et al.  Saliency detection by multi-context deep learning , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[36]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Chang-Su Kim,et al.  Primary Object Segmentation in Videos Based on Region Augmentation and Reduction , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[38]  Rongrong Ji,et al.  FreeAnchor: Learning to Match Anchors for Visual Object Detection , 2019, NeurIPS.

[39]  Radomír Mech,et al.  Unconstrained Salient Object Detection via Proposal Subset Optimization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[40]  Hermann Ney,et al.  Pan, zoom, scan — Time-coherent, trained automatic video cropping , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Ming-Hsuan Yang,et al.  SegFlow: Joint Learning for Video Object Segmentation and Optical Flow , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[42]  Margrit Betke,et al.  Salient Object Subitizing , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Kaiming He,et al.  Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[44]  Rynson W. H. Lau,et al.  Inferring Attention Shift Ranks of Objects for Image Saliency , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Haibin Ling,et al.  A Deep Network Solution for Attention and Aesthetics Aware Photo Cropping , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[46]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[47]  Liqing Zhang,et al.  Saliency Detection: A Spectral Residual Approach , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Olga Sorkine-Hornung,et al.  A comparative study of image retargeting , 2010, ACM Trans. Graph..

[49]  Haibin Ling,et al.  Salient Object Detection in the Deep Learning Era: An In-Depth Survey , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Yael Pritch,et al.  Saliency filters: Contrast based filtering for salient region detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[51]  Zhuowen Tu,et al.  Deeply Supervised Salient Object Detection with Short Connections , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[52]  Qiang Chen,et al.  Network In Network , 2013, ICLR.

[53]  Geoffrey E. Hinton,et al.  ImageNet classification with deep convolutional neural networks , 2012, Commun. ACM.

[54]  Li Su,et al.  Training Efficient Saliency Prediction Models with Knowledge Distillation , 2019, ACM Multimedia.

[55]  Haibin Ling,et al.  Scale and Object Aware Image Thumbnailing , 2013, International Journal of Computer Vision.

[56]  Dietmar Saupe,et al.  Effective Aesthetics Prediction With Multi-Level Spatially Pooled Features , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[57]  Zhe Wu,et al.  Cascaded Partial Decoder for Fast and Accurate Salient Object Detection , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[58]  Karteek Alahari,et al.  Learning Video Object Segmentation with Visual Memory , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[59]  Radomír Mech,et al.  Photo Aesthetics Ranking Network with Attributes and Content Adaptation , 2016, ECCV.

[60]  Yoichi Sato,et al.  Sensation-based photo cropping , 2009, ACM Multimedia.

[61]  Kwan-Liu Ma,et al.  Stereoscopic Thumbnail Creation via Efficient Stereo Saliency Detection , 2017, IEEE Transactions on Visualization and Computer Graphics.

[62]  Sanyuan Zhao,et al.  Pyramid Dilated Deeper ConvLSTM for Video Salient Object Detection , 2018, ECCV.

[63]  Neil D. B. Bruce,et al.  Revisiting Salient Object Detection: Simultaneous Detection, Ranking, and Subitizing of Multiple Salient Objects , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[64]  Ramesh Raskar,et al.  Learning Gaze Transitions from Depth to Improve Video Saliency Estimation , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[65]  Yong Jae Lee,et al.  Track and Segment: An Iterative Unsupervised Approach for Video Object Proposals , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[66]  Stephen Lin,et al.  Learning the Change for Automatic Image Cropping , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[67]  Mei Han,et al.  Efficient hierarchical graph-based video segmentation , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[68]  Wei Wu,et al.  SiamRPN++: Evolution of Siamese Visual Tracking With Very Deep Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[69]  Qiang Wang,et al.  Fast Online Object Tracking and Segmentation: A Unifying Approach , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Arthur C. Sanderson,et al.  Implementation of Automatic Focusing Algorithms for a Computer Vision System with Camera Control. , 1983 .

[71]  Zhengqin Li,et al.  Automatic Image Cropping: A Computational Complexity Study , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[72]  Dattaguru V Kamat A framework for visual saliency detection with applications to image thumbnailing , 2009 .

[73]  Michal Irani,et al.  Video Segmentation by Non-Local Consensus voting , 2014, BMVC.

[74]  Jin Tang,et al.  A Unified Multiple Graph Learning and Convolutional Network Model for Co-saliency Estimation , 2019, ACM Multimedia.

[75]  Haibin Ling,et al.  Revisiting Video Saliency Prediction in the Deep Learning Era , 2021, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Naila Murray,et al.  AVA: A large-scale database for aesthetic visual analysis , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[77]  HongJiang Zhang,et al.  Contrast-based image attention analysis by using fuzzy growing , 2003, MULTIMEDIA '03.

[78]  James J. Clark,et al.  Going from Image to Video Saliency: Augmenting Image Salience with Dynamic Attentional Push , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[79]  Sabine Süsstrunk,et al.  Salient Region Detection and Segmentation , 2008, ICVS.

[80]  Radomír Mech,et al.  Automatic Image Cropping using Visual Composition, Boundary Simplicity and Content Preservation Models , 2014, ACM Multimedia.