Grounded PCFG Induction with Images

Recent work in unsupervised parsing has tried to incorporate visual information into learning, but results suggest that these models need linguistic bias to compete against models that only rely on text. This work proposes grammar induction models which use visual information from images for labeled parsing, and achieve state-of-the-art results on grounded grammar induction on several languages. Results indicate that visual information is especially helpful in languages where high frequency words are more broadly distributed. Comparison between models with and without visual information shows that the grounded models are able to use visual information for proposing noun phrases, gathering useful information from images for unknown words, and achieving better performance at prepositional phrase attachment prediction.1

[1]  Felix Hill,et al.  Multi-Modal Models for Concrete and Abstract Concept Meaning , 2014, TACL.

[2]  Andrew McCallum,et al.  Unsupervised Labeled Parsing with Deep Inside-Outside Recursive Autoencoders , 2019, EMNLP.

[3]  Aaron C. Courville,et al.  Ordered Neurons: Integrating Tree Structures into Recurrent Neural Networks , 2018, ICLR.

[4]  Soumith Chintala,et al.  Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks , 2015, ICLR.

[5]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[6]  Dedre Gentner,et al.  Why Nouns Are Learned before Verbs: Linguistic Relativity Versus Natural Partitioning. Technical Report No. 257. , 1982 .

[7]  Sampo Pyysalo,et al.  Universal Dependencies v1: A Multilingual Treebank Collection , 2016, LREC.

[8]  Michael White,et al.  Madly Ambiguous: A Game for Learning about Structural Ambiguity and Why It's Hard for Computers , 2018, NAACL-HLT.

[9]  Alexander M. Rush,et al.  Unsupervised Recurrent Neural Network Grammars , 2019, NAACL.

[10]  Dan Klein,et al.  Cross-Domain Generalization of Neural Constituency Parsers , 2019, ACL.

[11]  Kevin Gimpel,et al.  Visually Grounded Neural Syntax Acquisition , 2019, ACL.

[12]  William Schuler,et al.  The Importance of Category Labels in Grammar Induction with Child-directed Utterances , 2020, IWPT.

[13]  Dan Klein,et al.  Constituency Parsing with a Self-Attentive Encoder , 2018, ACL.

[14]  Lane Schwartz,et al.  Depth-bounding is effective: Improvements and evaluation of unsupervised PCFG induction , 2018, EMNLP.

[15]  Lane Schwartz,et al.  Memory-Bounded Left-Corner Unsupervised Grammar Induction on Child-Directed Input , 2016, COLING.

[16]  Lane Schwartz,et al.  Unsupervised Grammar Induction with Depth-bounded PCFG , 2018, TACL.

[17]  Noam Chomsky,et al.  वाक्यविन्यास का सैद्धान्तिक पक्ष = Aspects of the theory of syntax , 1965 .

[18]  Mark Johnson,et al.  Using Left-corner Parsing to Encode Universal Structural Constraints in Grammar Induction , 2016, EMNLP.

[19]  Tom M. Mitchell,et al.  A Knowledge-Intensive Model for Prepositional Phrase Attachment , 2015, ACL.

[20]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[21]  L. Gleitman The Structural Sources of Verb Meanings , 2020, Sentence First, Arguments Afterward.

[22]  Alexander M. Rush,et al.  Compound Probabilistic Context-Free Grammars for Grammar Induction , 2019, ACL.

[23]  Lane Schwartz,et al.  Unsupervised Learning of PCFGs with Normalizing Flow , 2019, ACL.

[24]  Thomas L. Griffiths,et al.  Bayesian Inference for PCFGs via Markov Chain Monte Carlo , 2007, NAACL.

[25]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[26]  Yoav Seginer,et al.  Fast Unsupervised Incremental Parsing , 2007, ACL.

[27]  Aaron C. Courville,et al.  Neural Language Modeling by Jointly Learning Syntax and Lexicon , 2017, ICLR.

[28]  Julian M. Pine,et al.  Constructing a Language: A Usage-Based Theory of Language Acquisition. , 2004 .

[29]  Felix Hill,et al.  Concreteness and Subjectivity as Dimensions of Lexical Meaning , 2014, ACL.

[30]  Gordon Christie,et al.  Resolving Language and Vision Ambiguities Together: Joint Segmentation & Prepositional Attachment Resolution in Captioned Scenes , 2016, EMNLP.

[31]  Xiaolan Fu,et al.  Are Nouns Learned Before Verbs? Infants Provide Insight into a Longstanding Debate. , 2013, Child development perspectives.

[32]  D. Gentner Why verbs are hard to learn , 2006 .

[33]  William Schuler,et al.  Variance of Average Surprisal: A Better Predictor for Quality of Grammar from Unsupervised PCFG Induction , 2019, ACL.

[34]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).