Video highlight extraction via content-aware deep transfer

In this paper, we focus on detecting highlights in online videos. Given the explosive growth of online videos, it is becoming increasingly important to single out those highlights for audiences instead of requiring them browsing every tedious part of the video. It is ideally that the contents of extracted highlights can be consistent with the topic of the video as well as the preference of the individual audience. To this end, this paper introduces a novel content-aware approach by formulating the highlights detection in a transfer learning framework. Under this framework. The experimental results on three different types of videos show that our content-aware highlight extraction method is particularly useful for online videos content fetching, e.g. showing the abstraction of the entire video while playing focus on the parts that matches the user queries.

[1]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[2]  Ben Taskar,et al.  Determinantal Point Processes for Machine Learning , 2012, Found. Trends Mach. Learn..

[3]  Sheng Tang,et al.  Multi-modal tag localization for mobile video search , 2016, Multimedia Systems.

[4]  Li Liu,et al.  Recognizing Complex Activities by a Probabilistic Interval-Based Model , 2016, AAAI.

[5]  Sergey Ioffe,et al.  Rethinking the Inception Architecture for Computer Vision , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Stephen Lin,et al.  Faces as Lighting Probes via Unsupervised Deep Highlight Extraction , 2018, ECCV.

[7]  David S. Rosenblum,et al.  From action to activity: Sensor-based activity recognition , 2016, Neurocomputing.

[8]  Qi Tian,et al.  Enhancing Micro-video Understanding by Harnessing External Sounds , 2017, ACM Multimedia.

[9]  Ali Farhadi,et al.  Ranking Domain-Specific Highlights by Analyzing Edited Videos , 2014, ECCV.

[10]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[12]  Patrick Lambert,et al.  Video summarization from spatio-temporal features , 2008, TVS '08.

[13]  Heng Tao Shen,et al.  Hierarchical LSTM with Adjusted Temporal Attention for Video Captioning , 2017, IJCAI.

[14]  Xiu-Shen Wei,et al.  In Defense of Fully Connected Layers in Visual Representation Transfer , 2017, PCM.

[15]  Mubarak Shah,et al.  Query-Focused Extractive Video Summarization , 2016, ECCV.

[16]  Changsheng Xu,et al.  Video Highlight Detection via Deep Ranking Modeling , 2017, PSIVT.

[17]  IEEE conference on computer vision and pattern recognition , 2000, Proceedings IEEE Conference on Computer Vision and Pattern Recognition. CVPR 2000 (Cat. No.PR00662).

[18]  Yongdong Zhang,et al.  Multi-task deep visual-semantic embedding for video thumbnail selection , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Meng Wang,et al.  Oracle in Image Search: A Content-Based Approach to Performance Prediction , 2012, TOIS.

[20]  Michael F. Cohen,et al.  Real-time hyperlapse creation via optimal frame selection , 2015, ACM Trans. Graph..

[21]  Yale Song,et al.  Video2GIF: Automatic Generation of Animated GIFs from Video , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[23]  Xiaojun Chang,et al.  Revealing Event Saliency in Unconstrained Video Collection , 2017, IEEE Transactions on Image Processing.

[24]  Gonzalo Navarro,et al.  Word-based self-indexes for natural language text , 2012, TOIS.

[25]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Wu Liu,et al.  T-C3D: Temporal Convolutional 3D Network for Real-Time Action Recognition , 2018, AAAI.

[27]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[28]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Boqing Gong,et al.  Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[30]  Ba Tu Truong,et al.  Video abstraction: A systematic review and classification , 2007, TOMCCAP.

[31]  Kristen Grauman,et al.  Intentional Photos from an Unintentional Photographer: Detecting Snap Points in Egocentric Video with a Web Photo Prior , 2014, Mobile Cloud Visual Media Computing.

[32]  Ali Farhadi,et al.  Semantic Highlight Retrieval and Term Prediction , 2017, IEEE Transactions on Image Processing.

[33]  Luming Zhang,et al.  Action2Activity: Recognizing Complex Activities from Sensor Data , 2015, IJCAI.

[34]  Luc Van Gool,et al.  Query-adaptive Video Summarization via Quality-aware Relevance Estimation , 2017, ACM Multimedia.

[35]  Minyi Guo,et al.  Unsupervised Extraction of Video Highlights via Robust Recurrent Auto-Encoders , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[36]  John R. Kender,et al.  Optimization Algorithms for the Selection of Key Frame Sequences of Variable Length , 2002, ECCV.

[37]  David Salesin,et al.  Schematic storyboarding for video visualization and editing , 2006, SIGGRAPH 2006.

[38]  Tao Mei,et al.  Highlight Detection with Pairwise Deep Ranking for First-Person Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[41]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.