Reconstructive Sequence-Graph Network for Video Summarization

Current approaches mainly devote to modeling the video as a frame sequence by recurrent neural networks. However, one potential limitation of the sequence models is that they focus on capturing local neighborhood dependencies while the high-order dependencies in long distance are not fully exploited. In general, the frames in each shot record a certain activity and vary smoothly over time, but the multi-hop relationships occur frequently among shots. In this case, both the local and global dependencies are important for understanding the video content. Motivated by this point, we propose a Reconstructive Sequence-Graph Network (RSGN) to encode the frames and shots as sequence and graph hierarchically, where the frame-level dependencies are encoded by Long Short-Term Memory (LSTM), and the shot-level dependencies are captured by the Graph Convolutional Network (GCN). Then, the videos are summarized by exploiting both the local and global dependencies among shots. Besides, a reconstructor is developed to reward the summary generator, so that the generator can be optimized in an unsupervised manner, which can avert the lack of annotated data in video summarization. Practically, experiments on three popular datasets have demonstrated the superiority of our proposed approach.

[1]  Meng Wang,et al.  SocialGCN: An Efficient Graph Convolutional Network based Model for Social Recommendation , 2018, ArXiv.

[2]  Xuelong Li,et al.  Surveillance Video Synopsis via Scaling Down Objects , 2016, IEEE Transactions on Image Processing.

[3]  Bin Zhao,et al.  HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[4]  Trevor Darrell,et al.  Sequence to Sequence -- Video to Text , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[5]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[6]  Ke Zhang,et al.  Summary Transfer: Exemplar-Based Subset Selection for Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[7]  Yale Song,et al.  Video co-summarization: Video summarization by visual co-occurrence , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Shaohui Mei,et al.  Patch Based Video Summarization With Block Sparse Representation , 2020, IEEE Transactions on Multimedia.

[9]  Yueting Zhuang,et al.  Adaptive key frame extraction using unsupervised clustering , 1998, Proceedings 1998 International Conference on Image Processing. ICIP98 (Cat. No.98CB36269).

[10]  Yong Jae Lee,et al.  Discovering important people and objects for egocentric video summarization , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[11]  Boqing Gong,et al.  Query-Focused Video Summarization: Dataset, Evaluation, and a Memory Network Based Approach , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[12]  Zijian Zhang,et al.  Query-Biased Self-Attentive Network for Query-Focused Video Summarization , 2020, IEEE Transactions on Image Processing.

[13]  Esa Rahtu,et al.  Rethinking the Evaluation of Video Summaries , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[14]  Nahum Shimkin,et al.  ILS-SUMM: Iterated Local Search for Unsupervised Video Summarization , 2019, 2020 25th International Conference on Pattern Recognition (ICPR).

[15]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[16]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[17]  Max Welling,et al.  Semi-Supervised Classification with Graph Convolutional Networks , 2016, ICLR.

[18]  Yelena Yesha,et al.  Keyframe-based video summarization using Delaunay clustering , 2006, International Journal on Digital Libraries.

[19]  Arnaldo de Albuquerque Araújo,et al.  VSUMM: A mechanism designed to produce static video summaries and a novel evaluation method , 2011, Pattern Recognit. Lett..

[20]  Kristen Grauman,et al.  Diverse Sequential Subset Selection for Supervised Video Summarization , 2014, NIPS.

[21]  Bin Zhao,et al.  Quasi Real-Time Summarization for Consumer Videos , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Chong-Wah Ngo,et al.  Automatic video summarization by graph modeling , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[23]  Yi Yang,et al.  Hierarchical Recurrent Neural Encoder for Video Representation with Application to Captioning , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Shuicheng Yan,et al.  Graph-Based Global Reasoning Networks , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Shaohui Mei,et al.  L2,0 constrained sparse dictionary selection for video summarization , 2014, 2014 IEEE International Conference on Multimedia and Expo (ICME).

[27]  Jiebo Luo,et al.  Towards Scalable Summarization of Consumer Videos Via Sparse Dictionary Selection , 2012, IEEE Transactions on Multimedia.

[28]  Abhinav Gupta,et al.  Non-local Neural Networks , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[29]  Kaiyang Zhou,et al.  Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward , 2017, AAAI.

[30]  Sung Wook Baik,et al.  A comprehensive survey of multi-view video summarization , 2021, Pattern Recognit..

[31]  Yang Wang,et al.  Video Summarization Using Fully Convolutional Sequence Networks , 2018, ECCV.

[32]  Xuelong Li,et al.  A General Framework for Edited Video and Raw Video Summarization , 2017, IEEE Transactions on Image Processing.

[33]  Guillermo Sapiro,et al.  See all by looking at a few: Sparse modeling for finding representative objects , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[34]  Runhao Zeng,et al.  Graph Convolutional Networks for Temporal Action Localization , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[35]  Ke Zhang,et al.  Retrospective Encoders for Video Summarization , 2018, ECCV.

[36]  Juan Carlos Niebles,et al.  Title Generation for User Generated Videos , 2016, ECCV.

[37]  Siyang Wang,et al.  Attention is all you need for Videos: Self-attention based Video Summarization using Universal Transformers , 2019, ArXiv.

[38]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  Xuelong Li,et al.  A Multiview-Based Parameter Free Framework for Group Detection , 2017, AAAI.

[40]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[41]  Feng Mao,et al.  Hierarchical Video Frame Sequence Representation with Deep Convolutional Graph Network , 2018, ECCV Workshops.

[42]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[43]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[44]  Xuelong Li,et al.  Hierarchical Recurrent Neural Network for Video Summarization , 2017, ACM Multimedia.

[45]  R. Stephenson A and V , 1962, The British journal of ophthalmology.

[46]  Jungong Han,et al.  Deep Attentive Video Summarization With Distribution Consistency Learning , 2020, IEEE Transactions on Neural Networks and Learning Systems.

[47]  Kristen Grauman,et al.  Story-Driven Summarization for Egocentric Video , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.

[48]  Percy Liang,et al.  Adversarial Examples for Evaluating Reading Comprehension Systems , 2017, EMNLP.

[49]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[50]  Marc Brockschmidt,et al.  Structured Neural Summarization , 2018, ICLR.

[51]  Paolo Remagnino,et al.  Summarizing Videos with Attention , 2018, ACCV Workshops.

[52]  Hwann-Tzong Chen,et al.  Attentive and Adversarial Learning for Video Summarization , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[53]  Razi Iqbal,et al.  Blockchain-enabled deep semantic video-to-video summarization for IoT devices , 2020, Comput. Electr. Eng..