论文信息 - Lingusitic Analysis of Multi-Modal Recurrent Neural Networks

Lingusitic Analysis of Multi-Modal Recurrent Neural Networks

Recurrent neural networks (RNN) have gained a reputation for beating state-of-the-art results on many NLP benchmarks and for learning representations of words and larger linguistic units that encode complex syntactic and semantic structures. However, it is not straight-forward to understand how exactly these models make their decisions. Recently Li et al. (2015) developed methods to provide linguistically motivated analysis for RNNs trained for sentiment analysis. Here we focus on the analysis of a multi-modal Gated Recurrent Neural Network (GRU) architecture trained to predict image-vectors extracted from images using a CNN trained on ImageNet from their corresponding descriptions. We propose two methods to explore the importance of grammatical categories with respect to the model and the task. We observe that the model pays most attention to head-words, noun subjects and adjectival modifiers and least to determiners and coordinations.

[1] Grzegorz Chrupala,et al. Learning language through pictures , 2015, ACL.

[2] Pietro Perona,et al. Microsoft COCO: Common Objects in Context , 2014, ECCV.

[3] Noah A. Smith,et al. Turning on the Turbo: Fast Third-Order Non-Projective Turbo Parsers , 2013, ACL.

[4] Xinlei Chen,et al. Visualizing and Understanding Neural Models in NLP , 2015, NAACL.