Unsupervised Video Summarization via Attention-Driven Adversarial Learning

This paper presents a new video summarization approach that integrates an attention mechanism to identify the significant parts of the video, and is trained unsupervisingly via generative adversarial learning. Starting from the SUM-GAN model, we first develop an improved version of it (called SUM-GAN-sl) that has a significantly reduced number of learned parameters, performs incremental training of the model’s components, and applies a stepwise label-based strategy for updating the adversarial part. Subsequently, we introduce an attention mechanism to SUM-GAN-sl in two ways: (i) by integrating an attention layer within the variational auto-encoder (VAE) of the architecture (SUM-GAN-VAAE), and (ii) by replacing the VAE with a deterministic attention auto-encoder (SUM-GAN-AAE). Experimental evaluation on two datasets (SumMe and TVSum) documents the contribution of the attention auto-encoder to faster and more stable training of the model, resulting in a significant performance improvement with respect to the original model and demonstrating the competitiveness of the proposed SUM-GAN-AAE against the state of the art (Software publicly available at: https://github.com/e-apostolidis/SUM-GAN-AAE).

[1]  Tao Mei,et al.  Video Summarization by Learning Deep Side Semantic Embedding , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Jürgen Schmidhuber,et al.  Long Short-Term Memory , 1997, Neural Computation.

[3]  Eric P. Xing,et al.  Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder , 2018, Pattern Recognit. Lett..

[4]  Andrea Cavallaro,et al.  Video Summarisation by Classification with Deep Reinforcement Learning , 2018, BMVC.

[5]  Yang Wang,et al.  Video Summarization by Learning From Unpaired Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Naokazu Yokoya,et al.  Video Summarization Using Deep Semantic Features , 2016, ACCV.

[7]  Xuelong Li,et al.  Video Summarization With Attention-Based Encoder–Decoder Networks , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[8]  Yang Wang,et al.  Video Summarization Using Fully Convolutional Sequence Networks , 2018, ECCV.

[9]  Vasileios Mezaris,et al.  Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[11]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[12]  Ioannis Patras,et al.  A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization , 2019, AI4TV@MM.

[13]  Pascal Poupart,et al.  Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[14]  Gunhee Kim,et al.  A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[16]  Kaiyang Zhou,et al.  Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward , 2017, AAAI.

[17]  Paolo Remagnino,et al.  Summarizing Videos with Attention , 2018, ACCV Workshops.

[18]  Michael Kampffmeyer,et al.  DTR-GAN: dilated temporal relational adversarial network for video summarization , 2018, ACM TUR-C.

[19]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20]  Ping Li,et al.  Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization , 2019, AAAI.

[21]  Tal Hassner,et al.  Temporal Tessellation: A Unified Approach for Video Analysis , 2016, ICCV.

[22]  Ali Borji,et al.  Video Summarization Via Actionness Ranking , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23]  Bin Zhao,et al.  HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Xuelong Li,et al.  Hierarchical Recurrent Neural Network for Video Summarization , 2017, ACM Multimedia.

[26]  Vasileios Mezaris,et al.  A Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos , 2018, MMM.

[27]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Bingbing Ni,et al.  Video Summarization via Semantic Attended Networks , 2018, AAAI.

[29]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[30]  Hwann-Tzong Chen,et al.  Attentive and Adversarial Learning for Video Summarization , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32]  Wei Zhang,et al.  Extractive Video Summarizer with Memory Augmented Neural Networks , 2018, ACM Multimedia.