论文信息 - Unsupervised Video Summarization via Attention-Driven Adversarial Learning

Unsupervised Video Summarization via Attention-Driven Adversarial Learning

This paper presents a new video summarization approach that integrates an attention mechanism to identify the significant parts of the video, and is trained unsupervisingly via generative adversarial learning. Starting from the SUM-GAN model, we first develop an improved version of it (called SUM-GAN-sl) that has a significantly reduced number of learned parameters, performs incremental training of the model’s components, and applies a stepwise label-based strategy for updating the adversarial part. Subsequently, we introduce an attention mechanism to SUM-GAN-sl in two ways: (i) by integrating an attention layer within the variational auto-encoder (VAE) of the architecture (SUM-GAN-VAAE), and (ii) by replacing the VAE with a deterministic attention auto-encoder (SUM-GAN-AAE). Experimental evaluation on two datasets (SumMe and TVSum) documents the contribution of the attention auto-encoder to faster and more stable training of the model, resulting in a significant performance improvement with respect to the original model and demonstrating the competitiveness of the proposed SUM-GAN-AAE against the state of the art (Software publicly available at: https://github.com/e-apostolidis/SUM-GAN-AAE).

[1] Tao Mei,et al. Video Summarization by Learning Deep Side Semantic Embedding , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[2] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.

[3] Eric P. Xing,et al. Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder , 2018, Pattern Recognit. Lett..

[4] Andrea Cavallaro,et al. Video Summarisation by Classification with Deep Reinforcement Learning , 2018, BMVC.

[5] Yang Wang,et al. Video Summarization by Learning From Unpaired Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[6] Naokazu Yokoya,et al. Video Summarization Using Deep Semantic Features , 2016, ACCV.

[7] Xuelong Li,et al. Video Summarization With Attention-Based Encoder–Decoder Networks , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[8] Yang Wang,et al. Video Summarization Using Fully Convolutional Sequence Networks , 2018, ECCV.

[9] Vasileios Mezaris,et al. Fast shot segmentation combining global and local visual descriptors , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[10] C. Schmid,et al. Category-Specific Video Summarization , 2014, ECCV.

[11] Ke Zhang,et al. Video Summarization with Long Short-Term Memory , 2016, ECCV.

[12] Ioannis Patras,et al. A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization , 2019, AI4TV@MM.

[13] Pascal Poupart,et al. Variational Attention for Sequence-to-Sequence Models , 2017, COLING.

[14] Gunhee Kim,et al. A Memory Network Approach for Story-Based Temporal Summarization of 360° Videos , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[15] Luc Van Gool,et al. Creating Summaries from User Videos , 2014, ECCV.

[16] Kaiyang Zhou,et al. Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward , 2017, AAAI.

[17] Paolo Remagnino,et al. Summarizing Videos with Attention , 2018, ACCV Workshops.

[18] Michael Kampffmeyer,et al. DTR-GAN: dilated temporal relational adversarial network for video summarization , 2018, ACM TUR-C.

[19] Yale Song,et al. TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[20] Ping Li,et al. Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization , 2019, AAAI.

[21] Tal Hassner,et al. Temporal Tessellation: A Unified Approach for Video Analysis , 2016, ICCV.

[22] Ali Borji,et al. Video Summarization Via Actionness Ranking , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[23] Bin Zhao,et al. HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[24] Michael Lam,et al. Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25] Xuelong Li,et al. Hierarchical Recurrent Neural Network for Video Summarization , 2017, ACM Multimedia.

[26] Vasileios Mezaris,et al. A Motion-Driven Approach for Fine-Grained Temporal Segmentation of User-Generated Videos , 2018, MMM.

[27] Luc Van Gool,et al. Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[28] Bingbing Ni,et al. Video Summarization via Semantic Attended Networks , 2018, AAAI.

[29] Soumith Chintala,et al. Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[30] Hwann-Tzong Chen,et al. Attentive and Adversarial Learning for Video Summarization , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[31] Dumitru Erhan,et al. Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[32] Wei Zhang,et al. Extractive Video Summarizer with Memory Augmented Neural Networks , 2018, ACM Multimedia.