暂无分享,去创建一个
Anoop Cherian | Abhishek Das | Chiori Hori | Irfan Essa | Dhruv Batra | Tim K. Marks | Jue Wang | Devi Parikh | Raphael Gontijo Lopes | Huda AlAmri | Vincent Cartillier | Chiori Hori | Dhruv Batra | Devi Parikh | Irfan Essa | A. Cherian | Huda AlAmri | Abhishek Das | Vincent Cartillier | Jue Wang
[1] Subhashini Venugopalan,et al. Translating Videos to Natural Language Using Deep Recurrent Neural Networks , 2014, NAACL.
[2] Margaret Mitchell,et al. VQA: Visual Question Answering , 2015, International Journal of Computer Vision.
[3] Ali Farhadi,et al. Hollywood in Homes: Crowdsourcing Data Collection for Activity Understanding , 2016, ECCV.
[4] John R. Hershey,et al. Attention-Based Multimodal Fusion for Video Description , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[5] Hugo Larochelle,et al. GuessWhat?! Visual Object Discovery through Multi-modal Dialogue , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[6] Joelle Pineau,et al. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.
[7] Stefan Lee,et al. Learning Cooperative Visual Dialog Agents with Deep Reinforcement Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).
[8] Takaaki Hori,et al. End-to-end Conversation Modeling Track in DSTC6 , 2017, ArXiv.
[9] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).