暂无分享,去创建一个
[1] Bernt Schiele,et al. Coherent Multi-sentence Video Description with Variable Level of Detail , 2014, GCPR.
[2] Christopher D. Manning,et al. Get To The Point: Summarization with Pointer-Generator Networks , 2017, ACL.
[3] Chenliang Xu,et al. A Thousand Frames in Just a Few Words: Lingual Description of Videos through Latent Topics and Sparse Object Stitching , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[4] Emiel Krahmer,et al. Survey of the State of the Art in Natural Language Generation: Core tasks, applications and evaluation , 2017, J. Artif. Intell. Res..
[5] Juan Carlos Niebles,et al. Leveraging Video Descriptions to Learn Video Question Answering , 2016, AAAI.
[6] Raymond J. Mooney,et al. Learning to sportscast: a test of grounded language acquisition , 2008, ICML '08.
[7] D. Wilks. Canonical Correlation Analysis (CCA) , 2011 .
[8] Vadim Bulitko,et al. Automated Story Selection for Color Commentary in Sports , 2014, IEEE Transactions on Computational Intelligence and AI in Games.
[9] Petra Holtzmann,et al. Routledge Encyclopedia Of Narrative Theory , 2016 .
[10] C. Lawrence Zitnick,et al. Bringing Semantics into Focus Using Visual Abstraction , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition.
[11] Yueting Zhuang,et al. Video Question Answering via Gradually Refined Attention over Appearance and Motion , 2017, ACM Multimedia.
[12] Bernt Schiele,et al. Grounding Action Descriptions in Videos , 2013, TACL.
[13] Margaret Mitchell,et al. Generating Natural Questions About an Image , 2016, ACL.
[14] Mirella Lapata,et al. What’s This Movie About? A Joint Neural Network Architecture for Movie Content Analysis , 2018, NAACL.
[15] Louis-Philippe Morency,et al. Multimodal Machine Learning: A Survey and Taxonomy , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.
[16] Bernt Schiele,et al. A database for fine grained activity detection of cooking activities , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.
[17] Kyunghyun Cho,et al. Can neural machine translation do simultaneous translation? , 2016, ArXiv.
[18] Yoshua Bengio,et al. Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.
[19] Anoop Cherian,et al. Multimodal Attention for Fusion of Audio and Spatiotemporal Features for Video Description , 2018, CVPR Workshops.
[20] Gabriel Skantze,et al. Towards a General, Continuous Model of Turn-taking in Spoken Dialogue using LSTM Recurrent Neural Networks , 2017, SIGDIAL Conference.
[21] Nazli Ikizler-Cinbis,et al. Automatic Description Generation from Images: A Survey of Models, Datasets, and Evaluation Measures , 2016, J. Artif. Intell. Res..
[22] Lucia Specia,et al. Multimodal Lexical Translation , 2018, LREC.
[23] William B. Dolan,et al. Collecting Highly Parallel Data for Paraphrase Evaluation , 2011, ACL.
[24] Mirella Lapata,et al. Whodunnit? Crime Drama as a Case for Natural Language Understanding , 2018, Transactions of the Association for Computational Linguistics.
[25] Aren Jansen,et al. Audio Set: An ontology and human-labeled dataset for audio events , 2017, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[26] Bernt Schiele,et al. A dataset for Movie Description , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[27] Mohan S. Kankanhalli,et al. Multimodal fusion for multimedia analysis: a survey , 2010, Multimedia Systems.
[28] Ronald J. Williams,et al. A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.
[29] P. Mermelstein,et al. Distance measures for speech recognition, psychological and instrumental , 1976 .
[30] Philipp Koehn,et al. Re-evaluating the Role of Bleu in Machine Translation Research , 2006, EACL.
[31] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[32] Trevor Darrell,et al. Generating Visual Explanations , 2016, ECCV.
[33] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[34] Christopher Joseph Pal,et al. Using Descriptive Video Services to Create a Large Data Source for Video Annotation Research , 2015, ArXiv.
[35] Nick Campbell,et al. Doubly-Attentive Decoder for Multi-modal Neural Machine Translation , 2017, ACL.
[36] Mark Liberman,et al. Speaker identification on the SCOTUS corpus , 2008 .
[37] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[38] Jianfei Cai,et al. VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions , 2018, ECCV.
[39] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[40] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.
[41] Sanja Fidler,et al. MovieQA: Understanding Stories in Movies through Question-Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[42] Jimmy Ba,et al. Adam: A Method for Stochastic Optimization , 2014, ICLR.
[43] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.