Multimodal Storytelling via Generative Adversarial Imitation Learning

Deriving event storylines is an effective summarization method to succinctly organize extensive information, which can significantly alleviate the pain of information overload. The critical challenge is the lack of widely recognized definition of storyline metric. Prior studies have developed various approaches based on different assumptions about users' interests. These works can extract interesting patterns, but their assumptions do not guarantee that the derived patterns will match users' preference. On the other hand, their exclusiveness of single modality source misses cross-modality information. This paper proposes a method, multimodal imitation learning via Generative Adversarial Networks(MIL-GAN), to directly model users' interests as reflected by various data. In particular, the proposed model addresses the critical challenge by imitating users' demonstrated storylines. Our proposed model is designed to learn the reward patterns given user-provided storylines and then applies the learned policy to unseen data. The proposed approach is demonstrated to be capable of acquiring the user's implicit intent and outperforming competing methods by a substantial margin with a user study.

[1]  Thomas G. Dietterich What is machine learning? , 2015, Archives of Disease in Childhood.

[2]  Léon Bottou,et al.  Wasserstein GAN , 2017, ArXiv.

[3]  Guillaume Lample,et al.  Playing FPS Games with Deep Reinforcement Learning , 2016, AAAI.

[4]  Weinan Zhang,et al.  SeqGAN: Sequence Generative Adversarial Nets with Policy Gradient , 2016, AAAI.

[5]  Jianfeng Gao,et al.  Deep Reinforcement Learning with a Combinatorial Action Space for Predicting Popular Reddit Threads , 2016, EMNLP.

[6]  Stefano Ermon,et al.  Generative Adversarial Imitation Learning , 2016, NIPS.

[7]  Regina Barzilay,et al.  Improving Information Extraction by Acquiring External Evidence with Reinforcement Learning , 2016, EMNLP.

[8]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[9]  Gunhee Kim,et al.  Expressing an Image Stream with a Sequence of Natural Sentences , 2015, NIPS.

[10]  Honglak Lee,et al.  Action-Conditional Video Prediction using Deep Networks in Atari Games , 2015, NIPS.

[11]  M. de Rijke,et al.  Learning to Explain Entity Relationships in Knowledge Graphs , 2015, ACL.

[12]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[13]  Shane Legg,et al.  Human-level control through deep reinforcement learning , 2015, Nature.

[14]  Xiang Zhang,et al.  Text Understanding from Scratch , 2015, ArXiv.

[15]  Liangyu Chen,et al.  An Unsupervised Framework of Exploring Events on Twitter: Filtering, Extraction and Categorization , 2015, AAAI.

[16]  Ruslan Salakhutdinov,et al.  Unifying Visual-Semantic Embeddings with Multimodal Neural Language Models , 2014, ArXiv.

[17]  Eric P. Xing,et al.  Joint Summarization of Large-Scale Collections of Web Images and Videos for Storyline Reconstruction , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[18]  Aaron C. Courville,et al.  Generative Adversarial Networks , 2014, 1406.2661.

[19]  Lifu Huang,et al.  Optimized Event Storyline Generation based on Mixture-Event-Aspect Model , 2013, EMNLP.

[20]  Nitish Srivastava,et al.  Multimodal learning with deep Boltzmann machines , 2012, J. Mach. Learn. Res..

[21]  Chen Lin,et al.  Generating event storylines from microblogs , 2012, CIKM.

[22]  M. Shahriar Hossain,et al.  Storytelling in entity networks to support intelligence analysts , 2012, KDD.

[23]  Dafna Shahaf,et al.  Metro maps of science , 2012, KDD.

[24]  Tao Li,et al.  Generating Pictorial Storylines Via Minimum-Weight Connected Dominating Set Approximation in Multi-View Graphs , 2012, AAAI.

[25]  Heeyoung Lee,et al.  Joint Entity and Event Coreference Resolution across Documents , 2012, EMNLP.

[26]  Dafna Shahaf,et al.  Trains of thought: generating information maps , 2012, WWW.

[27]  Cong Yu,et al.  REX: Explaining Relationships between Entity Pairs , 2011, Proc. VLDB Endow..

[28]  Juhan Nam,et al.  Multimodal Deep Learning , 2011, ICML.

[29]  Michael Goesele,et al.  Scene Reconstruction and Visualization From Community Photo Collections , 2010, Proceedings of the IEEE.

[30]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[31]  Naren Ramakrishnan,et al.  Algorithms for Storytelling , 2006, IEEE Transactions on Knowledge and Data Engineering.

[32]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[33]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[34]  Dean Pomerleau,et al.  Efficient Training of Artificial Neural Networks for Autonomous Navigation , 1991, Neural Computation.

[35]  Andrew Y. Ng,et al.  Pharmacokinetics of a novel formulation of ivermectin after administration to goats , 2000, ICML.

[36]  M. V. Rossum,et al.  In Neural Computation , 2022 .