AC-SUM-GAN: Connecting Actor-Critic and Generative Adversarial Networks for Unsupervised Video Summarization

This paper presents a new method for unsupervised video summarization. The proposed architecture embeds an Actor-Critic model into a Generative Adversarial Network and formulates the selection of important video fragments (that will be used to form the summary) as a sequence generation task. The Actor and the Critic take part in a game that incrementally leads to the selection of the video key-fragments, and their choices at each step of the game result in a set of rewards from the Discriminator. The designed training workflow allows the Actor and Critic to discover a space of actions and automatically learn a policy for key-fragment selection. Moreover, the introduced criterion for choosing the best model after the training ends, enables the automatic selection of proper values for parameters of the training process that are not learned from the data (such as the regularization factor $\sigma $ ). Experimental evaluation on two benchmark datasets (SumMe and TVSum) demonstrates that the proposed AC-SUM-GAN model performs consistently well and gives SoA results in comparison to unsupervised methods, that are also competitive with respect to supervised methods.

[1]  Cheng Huang,et al.  A Novel Key-Frames Selection Framework for Comprehensive Video Summarization , 2020, IEEE Transactions on Circuits and Systems for Video Technology.

[2]  Ioannis Patras,et al.  Unsupervised Video Summarization via Attention-Driven Adversarial Learning , 2019, MMM.

[3]  Yiyan Chen,et al.  Weakly Supervised Video Summarization by Hierarchical Reinforcement Learning , 2019, MMAsia.

[4]  Xuelong Li,et al.  Property-Constrained Dual Learning for Video Summarization , 2019, IEEE Transactions on Neural Networks and Learning Systems.

[5]  Ioannis Patras,et al.  A Stepwise, Label-based Approach for Improving the Adversarial Training in Unsupervised Video Summarization , 2019, AI4TV@MM.

[6]  Zongpu Zhang,et al.  Unsupervised Video Summarization with Attentive Conditional Generative Adversarial Networks , 2019, ACM Multimedia.

[7]  Tieniu Tan,et al.  Stacked Memory Network for Video Summarization , 2019, ACM Multimedia.

[8]  Fu-En Yang,et al.  Learning Hierarchical Self-Attention for Video Summarization , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[9]  Wei-Ta Chu,et al.  Spatiotemporal Modeling and Label Distribution Learning for Video Summarization , 2019, 2019 IEEE 21st International Workshop on Multimedia Signal Processing (MMSP).

[10]  Xiao Liu,et al.  Action Parsing-Driven Video Summarization Based on Reinforcement Learning , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[11]  Nicol'o Savioli,et al.  A Hybrid Approach Between Adversarial Generative Networks and Actor-Critic Policy Gradient for Low Rate High-Resolution Image Compression , 2019, CVPR Workshops.

[12]  Ping Li,et al.  Cycle-SUM: Cycle-consistent Adversarial LSTM Networks for Unsupervised Video Summarization , 2019, AAAI.

[13]  Xuelong Li,et al.  Long-Short-Term Features for Dynamic Scene Classification , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[14]  Ali Borji,et al.  Video Summarization Via Actionness Ranking , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[15]  Tao Mei,et al.  Video Summarization by Learning Deep Side Semantic Embedding , 2019, IEEE Transactions on Circuits and Systems for Video Technology.

[16]  Hwann-Tzong Chen,et al.  Attentive and Adversarial Learning for Video Summarization , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[17]  Eugenia Koblents,et al.  Video Summarization with LSTM and Deep Attention Models , 2018, MMM.

[18]  Paolo Remagnino,et al.  Summarizing Videos with Attention , 2018, ACCV Workshops.

[19]  In-So Kweon,et al.  Discriminative Feature Learning for Unsupervised Video Summarization , 2018, AAAI.

[20]  Wei Zhang,et al.  Extractive Video Summarizer with Memory Augmented Neural Networks , 2018, ACM Multimedia.

[21]  Andrea Cavallaro,et al.  Video Summarisation by Classification with Deep Reinforcement Learning , 2018, BMVC.

[22]  Bin Zhao,et al.  HSA-RNN: Hierarchical Structure-Adaptive RNN for Video Summarization , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[23]  Yang Wang,et al.  Video Summarization by Learning From Unpaired Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[24]  Yang Wang,et al.  Video Summarization Using Fully Convolutional Sequence Networks , 2018, ECCV.

[25]  Michael Kampffmeyer,et al.  DTR-GAN: dilated temporal relational adversarial network for video summarization , 2018, ACM TUR-C.

[26]  Bingbing Ni,et al.  Video Summarization via Semantic Attended Networks , 2018, AAAI.

[27]  Sergey Levine,et al.  Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor , 2018, ICML.

[28]  Eric P. Xing,et al.  Unsupervised Object-Level Video Summarization with Online Motion Auto-Encoder , 2018, Pattern Recognit. Lett..

[29]  Kaiyang Zhou,et al.  Deep Reinforcement Learning for Unsupervised Video Summarization with Diversity-Representativeness Reward , 2017, AAAI.

[30]  Joelle Pineau,et al.  ACtuAL: Actor-Critic Under Adversarial Learning , 2017, ArXiv.

[31]  Xuelong Li,et al.  Hierarchical Recurrent Neural Network for Video Summarization , 2017, ACM Multimedia.

[32]  Amit K. Roy-Chowdhury,et al.  Weakly Supervised Summarization of Web Videos , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[33]  Xuelong Li,et al.  Video Summarization With Attention-Based Encoder–Decoder Networks , 2017, IEEE Transactions on Circuits and Systems for Video Technology.

[34]  Michael Lam,et al.  Unsupervised Video Summarization with Adversarial LSTM Networks , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Tal Hassner,et al.  Temporal Tessellation: A Unified Approach for Video Analysis , 2016, 2017 IEEE International Conference on Computer Vision (ICCV).

[36]  David Pfau,et al.  Connecting Generative Adversarial Networks and Actor-Critic Methods , 2016, ArXiv.

[37]  Naokazu Yokoya,et al.  Video Summarization Using Deep Semantic Features , 2016, ACCV.

[38]  Weinan Zhang,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[39]  Luc Van Gool,et al.  Temporal Segment Networks: Towards Good Practices for Deep Action Recognition , 2016, ECCV.

[40]  Lei Xie,et al.  Category driven deep recurrent neural network for video summarization , 2016, 2016 IEEE International Conference on Multimedia & Expo Workshops (ICMEW).

[41]  Ke Zhang,et al.  Video Summarization with Long Short-Term Memory , 2016, ECCV.

[42]  Ke Zhang,et al.  Summary Transfer: Exemplar-Based Subset Selection for Video Summarization , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Navdeep Jaitly,et al.  Pointer Networks , 2015, NIPS.

[44]  Yale Song,et al.  TVSum: Summarizing web videos using titles , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[45]  Luc Van Gool,et al.  Video summarization by learning submodular mixtures of objectives , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[46]  Dumitru Erhan,et al.  Going deeper with convolutions , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[47]  C. Schmid,et al.  Category-Specific Video Summarization , 2014, ECCV.

[48]  Luc Van Gool,et al.  Creating Summaries from User Videos , 2014, ECCV.

[49]  S. Hochreiter,et al.  Long Short-Term Memory , 1997, Neural Computation.

[50]  Haopeng Li,et al.  Spatiotemporal Modeling for Video Summarization Using Convolutional Recurrent Neural Network , 2019, IEEE Access.

[51]  Indu Sreedevi,et al.  Online Video Summarization: Predicting Future to Better Summarize Present , 2019, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[52]  N. Puhan,et al.  Enhanced Deep Video Summarization Network , 2019 .