A Visually-grounded First-person Dialogue Dataset with Verbal and Non-verbal Responses
暂无分享,去创建一个
[1] Jürgen Schmidhuber,et al. Long Short-Term Memory , 1997, Neural Computation.
[2] Mitesh M. Khapra,et al. Towards Building Large Scale Multimodal Domain-Aware Conversation Systems , 2017, AAAI.
[3] José M. F. Moura,et al. Visual Dialog , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[4] Yuji Matsumoto,et al. Applying Conditional Random Fields to Japanese Morphological Analysis , 2004, EMNLP.
[5] Daniel McDuff,et al. Emotional Dialogue Generation using Image-Grounded Language Models , 2018, CHI.
[6] Nanning Zheng,et al. Where and Why are They Looking? Jointly Inferring Human Attention and Intentions in Complex Tasks , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.
[7] Anoop Cherian,et al. Audio Visual Scene-Aware Dialog , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).
[8] Jian Sun,et al. Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[9] Dong-Yan Huang,et al. Audio-visual emotion recognition using deep transfer learning and multiple temporal models , 2017, ICMI.
[10] Michael S. Bernstein,et al. Visual7W: Grounded Question Answering in Images , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[11] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.
[12] Chloé Clavel,et al. UE-HRI: a new dataset for the study of user engagement in spontaneous human-robot interactions , 2017, ICMI.
[13] R'emi Louf,et al. HuggingFace's Transformers: State-of-the-art Natural Language Processing , 2019, ArXiv.
[14] James M. Rehg,et al. Connecting Gaze, Scene, and Attention: Generalized Attention Estimation via Joint Modeling of Gaze and Scene Saliency , 2018, ECCV.
[15] Tat-Seng Chua,et al. Knowledge-aware Multimodal Dialogue Systems , 2018, ACM Multimedia.
[16] Qi Wu,et al. The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).
[17] José M. F. Moura,et al. Visual Coreference Resolution in Visual Dialog using Neural Module Networks , 2018, ECCV.
[18] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.
[19] Ming-Wei Chang,et al. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.
[20] Michael Neff,et al. A Corpus of Gesture-Annotated Dialogues for Monologue-to-Dialogue Generation from Personal Narratives , 2016, LREC.
[21] Joelle Pineau,et al. The Ubuntu Dialogue Corpus: A Large Dataset for Research in Unstructured Multi-Turn Dialogue Systems , 2015, SIGDIAL Conference.
[22] Antonio Torralba,et al. Where are they looking? , 2015, NIPS.
[23] Takio Kurita,et al. Gated spatio and temporal convolutional neural network for activity recognition: towards gated multimodal deep learning , 2017, EURASIP J. Image Video Process..
[24] Jianfeng Gao,et al. Image-Grounded Conversations: Multimodal Context for Natural Question and Response Generation , 2017, IJCNLP.
[25] Margaret Mitchell,et al. Generating Natural Questions About an Image , 2016, ACL.
[26] Jason Weston,et al. Talk the Walk: Navigating New York City through Grounded Dialogue , 2018, ArXiv.
[27] Koichi Shinoda,et al. Deep Learning Based Multi-modal Addressee Recognition in Visual Scenes with Utterances , 2018, IJCAI.
[28] George Karypis,et al. A Comparison of Document Clustering Techniques , 2000 .