论文信息 - Image-Enhanced Multi-level Sentence Representation Net for Natural Language Inference

Image-Enhanced Multi-level Sentence Representation Net for Natural Language Inference

Natural Language Inference (NLI) task requires an agent to determine the semantic relation between a premise sentence (p) and a hypothesis sentence (h), which demands sufficient understanding about sentences from lexical knowledge to global semantic. Due to the issues such as polysemy, ambiguity, as well as fuzziness of sentences, fully understanding sentences is still challenging. To this end, we propose an Image-Enhanced Multi-Level Sentence Representation Net (IEMLRN), a novel architecture that is able to utilize the image to enhance the sentence semantic understanding at different scales. To be specific, we introduce the corresponding image of sentences as reference information, which can be helpful for sentence semantic understanding and inference relation evaluation. Since image information might be related to the sentence semantics at different scales, we design a multi-level architecture to understand sentences from different granularity and generate the sentence representation more precisely. Experimental results on the large-scale NLI corpus and real-world NLI alike corpus demonstrate that IEMLRN can simultaneously improve the performance. It is noteworthy that IEMLRN significantly outperforms the state-of-the-art sentence-encoding based models on the challenging hard subset and challenging lexical subset of SNLI corpus.

[1] Raffaella Bernardi,et al. Entailment above the word level in distributional semantics , 2012, EACL.

[2] Sungzoon Cho,et al. Distance-based Self-Attention Network for Natural Language Inference , 2017, ArXiv.

[3] Jason Weston,et al. Reading Wikipedia to Answer Open-Domain Questions , 2017, ACL.

[4] Andrew Zisserman,et al. Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[5] Jeffrey Pennington,et al. GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[6] Samuel R. Bowman,et al. A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference , 2017, NAACL.

[7] Jian Zhang,et al. Natural Language Inference over Interaction Space , 2017, ICLR.

[8] Ido Dagan,et al. Directional distributional similarity for lexical inference , 2010, Natural Language Engineering.

[9] Hong Yu,et al. Neural Tree Indexers for Text Understanding , 2016, EACL.

[10] Wei Xu,et al. Deep Captioning with Multimodal Recurrent Neural Networks (m-RNN) , 2014, ICLR.

[11] Rui Yan,et al. Natural Language Inference by Tree-Based Convolution and Heuristic Matching , 2015, ACL.

[12] Yi Zheng,et al. Reading the Videos: Temporal Labeling for Crowdsourced Time-Sync Videos Based on Semantic Embedding , 2016, AAAI.

[13] Tao Mei,et al. Jointly Modeling Embedding and Translation to Bridge Video and Language , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[14] Geoffrey Zweig,et al. From captions to visual concepts and back , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[15] Omer Levy,et al. Annotation Artifacts in Natural Language Inference Data , 2018, NAACL.

[16] David J. Weir,et al. Learning to Distinguish Hypernyms and Co-Hyponyms , 2014, COLING.

[17] Christopher Potts,et al. A Fast Unified Model for Parsing and Sentence Understanding , 2016, ACL.

[18] Christopher Potts,et al. A large annotated corpus for learning natural language inference , 2015, EMNLP.

[19] Oren Etzioni,et al. Combining Retrieval, Statistics, and Inference to Answer Elementary Science Questions , 2016, AAAI.

[20] Yoav Goldberg,et al. Breaking NLI Systems with Sentences that Require Simple Lexical Inferences , 2018, ACL.

[21] Qi Wu,et al. The VQA-Machine: Learning How to Use Existing Vision Algorithms to Answer New Questions , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[22] Enhong Chen,et al. Exploring the Emerging Type of Comment for Online Videos , 2017, ACM Trans. Web.

[23] Samy Bengio,et al. Show and tell: A neural image caption generator , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[24] Yanjie Fu,et al. Fake News Detection with Deep Diffusive Network Model , 2018, ArXiv.

[25] Mirella Lapata,et al. Long Short-Term Memory-Networks for Machine Reading , 2016, EMNLP.

[26] Xiang Zhang,et al. OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks , 2013, ICLR.

[27] Enhong Chen,et al. Finding Similar Exercises in Online Education Systems , 2018, KDD.

[28] Yang Liu,et al. Learning Natural Language Inference using Bidirectional LSTM model and Inner-Attention , 2016, ArXiv.

[29] Enhong Chen,et al. A Context-Enriched Neural Network Method for Recognizing Lexical Entailment , 2017, AAAI.

[30] Stefan Carlsson,et al. CNN Features Off-the-Shelf: An Astounding Baseline for Recognition , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[31] Phil Blunsom,et al. Reasoning about Entailment with Neural Attention , 2015, ICLR.

[32] Patrick Pantel,et al. From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[33] Zhen-Hua Ling,et al. Recurrent Neural Network-Based Sentence Encoder with Gated Attention for Natural Language Inference , 2017, RepEval@EMNLP.

[34] Philip S. Yu,et al. Multi-view collective tensor decomposition for cross-modal hashing , 2018, International Journal of Multimedia Information Retrieval.

[35] Joelle Pineau,et al. Building End-To-End Dialogue Systems Using Generative Hierarchical Neural Network Models , 2015, AAAI.