Learning to Compose Neural Networks for Question Answering

We describe a question answering model that applies to both images and structured knowledge bases. The model uses natural language strings to automatically assemble neural networks from a collection of composable modules. Parameters for these modules are learned jointly with network-assembly parameters via reinforcement learning, with only (world, question, answer) triples as supervision. Our approach, which we term a dynamic neural model network, achieves state-of-the-art results on benchmark datasets in both visual and structured domains.

[1]  Andrew Zisserman,et al.  Very Deep Convolutional Networks for Large-Scale Image Recognition , 2014, ICLR.

[2]  Yuandong Tian,et al.  Simple Baseline for Visual Question Answering , 2015, ArXiv.

[3]  Richard Socher,et al.  A Neural Network for Factoid Question Answering over Paragraphs , 2014, EMNLP.

[4]  Andrew Y. Ng,et al.  Parsing with Compositional Vector Grammars , 2013, ACL.

[5]  Kate Saenko,et al.  Ask, Attend and Answer: Exploring Question-Guided Spatial Attention for Visual Question Answering , 2015, ECCV.

[6]  Hang Li,et al.  Neural Enquirer: Learning to Query Tables , 2015, ArXiv.

[7]  Dan Klein,et al.  Neural Module Networks , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Percy Liang,et al.  Compositional Semantic Parsing on Semi-Structured Tables , 2015, ACL.

[9]  Yoshua Bengio,et al.  Global training of document processing systems using graph transformer networks , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[10]  Christopher D. Manning,et al.  The Stanford Typed Dependencies Representation , 2008, CF+CDPE@COLING.

[11]  Jacob Andreas,et al.  Semantic Parsing as Machine Translation , 2013, ACL.

[12]  Ronald J. Williams,et al.  Simple Statistical Gradient-Following Algorithms for Connectionist Reinforcement Learning , 2004, Machine Learning.

[13]  Léon Bottou,et al.  From machine learning to machine reasoning , 2011, Machine Learning.

[14]  Margaret Mitchell,et al.  VQA: Visual Question Answering , 2015, International Journal of Computer Vision.

[15]  Dan Klein,et al.  Deep Compositional Question Answering with Neural Module Networks , 2015, ArXiv.

[16]  Richard S. Zemel,et al.  Exploring Models and Data for Image Question Answering , 2015, NIPS.

[17]  Eunsol Choi,et al.  Scaling Semantic Parsers with On-the-Fly Ontology Matching , 2013, EMNLP.

[18]  Jason Weston,et al.  Question Answering with Subgraph Embeddings , 2014, EMNLP.

[19]  Richard S. Zemel,et al.  Image Question Answering: A Visual Semantic Embedding Model and a New Dataset , 2015, ArXiv.

[20]  Jiasen Lu,et al.  VQA: Visual Question Answering , 2015, ICCV.

[21]  Cuong Chau,et al.  Montague Meets Markov: Deep Semantics with Probabilistic Logical Form , 2013, *SEMEVAL.

[22]  Jayant Krishnamurthy,et al.  Jointly Learning to Parse and Perceive: Connecting Natural Language to the Physical World , 2013, TACL.

[23]  Mark Steedman,et al.  Inducing Probabilistic CCG Grammars from Logical Form with Higher-Order Unification , 2010, EMNLP.

[24]  Luke S. Zettlemoyer,et al.  A Joint Model of Language and Perception for Grounded Attribute Learning , 2012, ICML.

[25]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[26]  Yoshua Bengio,et al.  Show, Attend and Tell: Neural Image Caption Generation with Visual Attention , 2015, ICML.

[27]  Dan Klein,et al.  Learning Dependency-Based Compositional Semantics , 2011, CL.

[28]  Raymond J. Mooney,et al.  Learning Synchronous Grammars for Semantic Parsing with Lambda Calculus , 2007, ACL.

[29]  Mario Fritz,et al.  Ask Your Neurons: A Neural-Based Approach to Answering Questions about Images , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[30]  Bohyung Han,et al.  Image Question Answering Using Convolutional Neural Network with Dynamic Parameter Prediction , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[31]  Edward Grefenstette,et al.  Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors , 2013, *SEMEVAL.

[32]  Alexander J. Smola,et al.  Stacked Attention Networks for Image Question Answering , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Tom M. Mitchell,et al.  Vector Space Semantic Parsing: A Framework for Compositional Vector Space Models , 2013, CVSM@ACL.

[34]  Matthew D. Zeiler ADADELTA: An Adaptive Learning Rate Method , 2012, ArXiv.

[35]  Jonathan Berant,et al.  Semantic Parsing via Paraphrasing , 2014, ACL.

[36]  Phil Blunsom,et al.  Teaching Machines to Read and Comprehend , 2015, NIPS.

[37]  Zhengdong Lu,et al.  Neural Enquirer: Learning to Query Tables in Natural Language , 2016, IEEE Data Eng. Bull..

[38]  Mark Steedman,et al.  Combined Distributional and Logical Semantics , 2013, TACL.