Category-theoretic quantitative compositional distributional models of natural language semantics

This thesis is about the problem of compositionality in distributional semantics. Distributional semantics presupposes that the meanings of words are a function of their occurrences in textual contexts. It models words as distributions over these contexts and represents them as vectors in high dimensional spaces. The problem of compositionality for such models concerns itself with how to produce representations for larger units of text by composing the representations of smaller units of text. This thesis focuses on a particular approach to this compositionality problem, namely using the categorical framework developed by Coecke, Sadrzadeh, and Clark, which combines syntactic analysis formalisms with distributional semantic representations of meaning to produce syntactically motivated composition operations. This thesis shows how this approach can be theoretically extended and practically implemented to produce concrete compositional distributional models of natural language semantics. It furthermore demonstrates that such models can perform on par with, or better than, other competing approaches in the field of natural language processing. There are three principal contributions to computational linguistics in this thesis. The first is to extend the DisCoCat framework on the syntactic front and semantic front, incorporating a number of syntactic analysis formalisms and providing learning procedures allowing for the generation of concrete compositional distributional models. The second contribution is to evaluate the models developed from the procedures presented here, showing that they outperform other compositional distributional models present in the literature. The third contribution is to show how using category theory to solve linguistic problems forms a sound basis for research, illustrated by examples of work on this topic, that also suggest directions for future research.

[1]  Johan van Benthem,et al.  The semantics of variety in categorial grammar , 1988 .

[2]  N. Bourbaki Commutative Algebra: Chapters 1-7 , 1989 .

[3]  Mark Steedman,et al.  The syntactic process , 2004, Language, speech, and communication.

[4]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[5]  Timothy A. D. Fowler Parsing CCGbank with the Lambek Calculus , 2009 .

[6]  Dominic Widdows,et al.  Geometry and Meaning , 2004, Computational Linguistics.

[7]  Mirella Lapata,et al.  Composition in Distributional Models of Semantics , 2010, Cogn. Sci..

[8]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[9]  Marco Baroni,et al.  Nouns are Vectors, Adjectives are Matrices: Representing Adjective-Noun Constructions in Semantic Space , 2010, EMNLP.

[10]  Ioannis Korkontzelos,et al.  Estimating Linear Models for Compositional Distributional Semantics , 2010, COLING.

[11]  Stephen Clark Type-Driven Syntax and Semantics for Composing Meaning Vectors , 2013, Quantum Physics and Linguistics.

[12]  Emanuele Della Valle,et al.  An Introduction to Information Retrieval , 2013 .

[13]  Alaa A. Kharbouch,et al.  Three models for the description of language , 1956, IRE Trans. Inf. Theory.

[14]  Stephen Clark,et al.  Combining Symbolic and Distributional Models of Meaning , 2007, AAAI Spring Symposium: Quantum Interaction.

[15]  S. Lane Categories for the Working Mathematician , 1971 .

[16]  Joachim Lambek,et al.  Type Grammar Revisited , 1997, LACL.

[17]  Milosz Michalski,et al.  Geometry of quantum states: an introduction to quantum entanglement by Ingemar Bengtsson and Karol Zyczkowski , 2006, Quantum Inf. Comput..

[18]  Mark Steedman,et al.  Combinatory Categorial Grammar , 2011 .

[19]  Jason Baldridge,et al.  Multi-Modal Combinatory Categorial Grammar , 2003, EACL.

[20]  Gregory Grefenstette,et al.  Use of syntactic context to produce term association lists for text retrieval , 1992, SIGIR '92.

[21]  Anne Preller,et al.  Bell States and Negative Sentences in the Distributed Model of Meaning , 2011, Electron. Notes Theor. Comput. Sci..

[22]  L. Wittgenstein Philosophical investigations = Philosophische Untersuchungen , 1958 .

[23]  B. Coecke Kindergarten Quantum Mechanics: Lecture Notes , 2006 .

[24]  Wojciech Buszkowski,et al.  Lambek Grammars Based on Pregroups , 2001, LACL.

[25]  P. Selinger A Survey of Graphical Languages for Monoidal Categories , 2009, 0908.3347.

[26]  Samson Abramsky,et al.  A categorical semantics of quantum protocols , 2004, Proceedings of the 19th Annual IEEE Symposium on Logic in Computer Science, 2004..

[27]  Michael Moortgat,et al.  Categorial Type Logics , 1997, Handbook of Logic and Language.

[28]  Mehrnoosh Sadrzadeh,et al.  Experimental Support for a Categorical Compositional Distributional Model of Meaning , 2011, EMNLP.

[29]  Andrew Y. Ng,et al.  Semantic Compositionality through Recursive Matrix-Vector Spaces , 2012, EMNLP.

[30]  David J. Weir,et al.  The equivalence of four extensions of context-free grammars , 1994, Mathematical systems theory.

[31]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[32]  Stephen Clark,et al.  Mathematical Foundations for a Compositional Distributional Model of Meaning , 2010, ArXiv.

[33]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[34]  Mehrnoosh Sadrzadeh,et al.  High Level Quantum Structures in Linguistics and Multi Agent Systems , 2007, AAAI Spring Symposium: Quantum Interaction.

[35]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[36]  Stephen Clark,et al.  Concrete Sentence Spaces for Compositional Distributional Models of Meaning , 2010, IWCS.

[37]  John M. Lee Riemannian Manifolds: An Introduction to Curvature , 1997 .

[38]  G. Frege Über Sinn und Bedeutung , 1892 .

[39]  Tadao Kasami,et al.  An Efficient Recognition and Syntax-Analysis Algorithm for Context-Free Languages , 1965 .

[40]  Mirella Lapata,et al.  Measuring Distributional Similarity in Context , 2010, EMNLP.

[41]  Edward Grefenstette,et al.  Towards a Formal Distributional Semantics: Simulating Logical Calculi with Tensors , 2013, *SEMEVAL.

[42]  S. Clark,et al.  A Compositional Distributional Model of Meaning , 2008 .

[43]  E. Guevara A Regression Model of Adjective-Noun Compositionality in Distributional Semantics , 2010 .

[44]  Tony Plate,et al.  Holographic Reduced Representations: Convolution Algebra for Compositional Distributed Representations , 1991, IJCAI.

[45]  John C. Baez,et al.  Physics, Topology, Logic and Computation: A Rosetta Stone , 2009, 0903.0340.

[46]  Hinrich Schütze,et al.  Ambiguity resolution in language learning , 1997 .

[47]  James Richard Curran,et al.  From distributional to semantic similarity , 2004 .

[48]  Helmut Schmid,et al.  Improvements in Part-of-Speech Tagging with an Application to German , 1999 .

[49]  Phil Blunsom,et al.  The Role of Syntax in Vector Space Models of Compositional Semantics , 2013, ACL.

[50]  John Cocke,et al.  Programming languages and their compilers: Preliminary notes , 1969 .

[51]  Mehrnoosh Sadrzadeh,et al.  Quantum Physics and Linguistics - A Compositional, Diagrammatic Discourse , 2013, Quantum Physics and Linguistics.

[52]  Dominic Widdows,et al.  Semantic Vector Products: Some Initial Investigations , 2008 .

[53]  Daniel H. Younger,et al.  Recognition and Parsing of Context-Free Languages in Time n^3 , 1967, Inf. Control..

[54]  Chih-Jen Lin,et al.  Projected Gradient Methods for Nonnegative Matrix Factorization , 2007, Neural Computation.

[55]  Zellig S. Harris,et al.  Mathematical structures of language , 1968, Interscience tracts in pure and applied mathematics.

[56]  Mehrnoosh Sadrzadeh,et al.  Experimenting with transitive verbs in a DisCoCat , 2011, GEMS.

[57]  Gene H. Golub,et al.  Generalized cross-validation as a method for choosing a good ridge parameter , 1979, Milestones in Matrix Computation.

[58]  Johan Hall MaltParser -- An Architecture for Inductive Labeled Dependency Parsing , 2006 .

[59]  H. Alshawi,et al.  The Core Language Engine , 1994 .

[60]  Katrin Erk,et al.  A Structured Vector Space Model for Word Meaning in Context , 2008, EMNLP.

[61]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[62]  Johan Bos,et al.  Linguistically Motivated Large-Scale NLP with C&C and Boxer , 2007, ACL.

[63]  Gregory Grefenstette,et al.  Explorations in automatic thesaurus discovery , 1994 .

[64]  Robin Milner,et al.  A Theory of Type Polymorphism in Programming , 1978, J. Comput. Syst. Sci..

[65]  Mehrnoosh Sadrzadeh,et al.  Multi-Step Regression Learning for Compositional Distributional Semantics , 2013, IWCS.

[66]  Gemma Boleda,et al.  Distributional Semantics in Technicolor , 2012, ACL.

[67]  K. BenthemvanJ.F.A. Language in Action. Categories, Lambdas and Dynamic Logic , 1991 .

[68]  Wojciech Buszkowski,et al.  Pregroup Grammars and Context-free Grammars , 2007 .

[69]  Magnus Sahlgren,et al.  The Word-Space Model: using distributional analysis to represent syntagmatic and paradigmatic relations between words in high-dimensional vector spaces , 2006 .

[70]  Sanjay Ghemawat,et al.  MapReduce: Simplified Data Processing on Large Clusters , 2004, OSDI.

[71]  M. Sadrzadeh,et al.  Concrete Compositional Sentence Spaces 1 , 2010 .

[72]  Richard Montague,et al.  ENGLISH AS A FORMAL LANGUAGE , 1975 .

[73]  John Cocke,et al.  Programming languages and their compilers , 1969 .

[74]  C. J. van Rijsbergen,et al.  The geometry of information retrieval , 2004 .

[75]  R. F. C. Walters,et al.  Categories and computer science , 1992, Cambridge computer science texts.

[76]  Gerald Penn,et al.  Accurate Context-Free Parsing with Combinatory Categorial Grammar , 2010, ACL.

[77]  Man-Duen Choi Completely positive linear maps on complex matrices , 1975 .

[78]  John A. Carroll,et al.  Applied morphological processing of English , 2001, Natural Language Engineering.

[79]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[80]  Mirella Lapata,et al.  Vector-based Models of Semantic Composition , 2008, ACL.

[81]  Karol Zyczkowski,et al.  On Duality between Quantum Maps and Quantum States , 2004, Open Syst. Inf. Dyn..

[82]  B. Coecke,et al.  Categories for the practising physicist , 2009, 0905.3010.

[83]  Ehud Rivlin,et al.  Placing search in context: the concept revisited , 2002, TOIS.

[84]  R. Hindley The Principal Type-Scheme of an Object in Combinatory Logic , 1969 .

[85]  Mehrnoosh Sadrzadeh,et al.  Lambek vs. Lambek: Functorial vector space semantics and string diagrams for Lambek calculus , 2013, Ann. Pure Appl. Log..

[86]  Edward Grefenstette Analysing Document Similarity Measures , 2009 .

[87]  Dimitri Kartsaklis,et al.  A Unified Sentence Space for Categorical Distributional-Compositional Semantics: Theory and Experiments , 2012, COLING.

[88]  Jay Earley,et al.  An efficient context-free parsing algorithm , 1970, Commun. ACM.

[89]  Géraldine Legendre,et al.  The Harmonic Mind: From Neural Computation to Optimality-Theoretic GrammarVolume I: Cognitive Architecture (Bradford Books) , 2006 .

[90]  J. Lambek Compact Monoidal Categories from Linguistics to Physics , 2010 .

[91]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[92]  J. Lambek The Mathematics of Sentence Structure , 1958 .