暂无分享,去创建一个
Jie Zhou | Cheng Niu | Yang Feng | Jinchao Zhang | Zekang Li | Zongjia Li | Jie Zhou | Jinchao Zhang | Cheng Niu | Yang Feng | Zekang Li | Zongjia Li
[1] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[2] Alon Lavie,et al. Meteor Universal: Language Specific Translation Evaluation for Any Target Language , 2014, WMT@ACL.
[3] Jason Weston,et al. Learning to Speak and Act in a Fantasy Text Adventure Game , 2019, EMNLP.
[4] Chin-Yew Lin,et al. ROUGE: A Package for Automatic Evaluation of Summaries , 2004, ACL 2004.
[5] Cordelia Schmid,et al. VideoBERT: A Joint Model for Video and Language Representation Learning , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).
[6] George Kurian,et al. Google's Neural Machine Translation System: Bridging the Gap between Human and Machine Translation , 2016, ArXiv.
[7] Jason Weston,et al. Wizard of Wikipedia: Knowledge-Powered Conversational agents , 2018, ICLR.
[8] Tim K. Marks,et al. Audio Visual Scene-aware dialog (AVSD) Track for Natural Language Generation in DSTC7 , 2019 .
[9] Yash Goyal,et al. Making the V in VQA Matter: Elevating the Role of Image Understanding in Visual Question Answering , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[10] Salim Roukos,et al. Bleu: a Method for Automatic Evaluation of Machine Translation , 2002, ACL.
[11] Danqi Chen,et al. CoQA: A Conversational Question Answering Challenge , 2018, TACL.
[12] Pascale Fung,et al. Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems , 2018, ACL.
[13] Ahmed El Kholy,et al. UNITER: Learning UNiversal Image-TExt Representations , 2019, ECCV 2020.
[14] Anoop Cherian,et al. End-to-end Audio Visual Scene-aware Dialog Using Multimodal Attention-based Video Features , 2018, ICASSP 2019 - 2019 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).
[15] Fabio Viola,et al. The Kinetics Human Action Video Dataset , 2017, ArXiv.
[16] G. G. Stokes. "J." , 1890, The New Yale Book of Quotations.
[17] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[18] C. Lawrence Zitnick,et al. CIDEr: Consensus-based image description evaluation , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[19] Yejin Choi,et al. The Curious Case of Neural Text Degeneration , 2019, ICLR.
[20] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[21] Yang Feng,et al. Incremental Transformer with Deliberation Decoder for Document Grounded Conversations , 2019, ACL.
[22] Florian Metze,et al. CMU Sinbad’s Submission for the DSTC7 AVSD Challenge , 2019 .
[23] Yu Cheng,et al. UNITER: UNiversal Image-TExt Representation Learning , 2019, ECCV.
[24] Alan W. Black,et al. A Dataset for Document Grounded Conversations , 2018, EMNLP.
[25] Thomas Wolf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[26] DSTC 7-AVSD : Scene-Aware Video-Dialogue Systems with Dual Attention , 2018 .
[27] Doyen Sahoo,et al. Multimodal Transformer Networks for End-to-End Video-Grounded Dialogue Systems , 2019, ACL.
[28] J. Koenderink. Q… , 2014, Les noms officiels des communes de Wallonie, de Bruxelles-Capitale et de la communaute germanophone.
[29] Tien Dat Nguyen,et al. From FiLM to Video: Multi-turn Question Answering with Multi-modal Context , 2018, ArXiv.
[30] Ilya Sutskever,et al. Language Models are Unsupervised Multitask Learners , 2019 .
[31] Ji Wang,et al. Pretraining-Based Natural Language Generation for Text Summarization , 2019, CoNLL.
[32] Thomas Wolf,et al. TransferTransfo: A Transfer Learning Approach for Neural Network Based Conversational Agents , 2019, ArXiv.