Syntactic control for compositional vector space models

The framework of Compositional Distributional Semantics unifies vector space models for lexical meanings with a compositional account of how these meanings combine into phrases and larger units. The syntactic engines that have been used to drive the interpretation process (Lambek grammars, pregroups) are problematic in two respects (overgeneration and undergeneration) compromising the accuracy of the quantitative values associated with a derivation. We address these problems by moving to a non-symmetric, non-associative, non-unital type logic with a tree-building tensor operation, generating phrases rather than strings. Composition (tensor) and decomposition of phrases (cotensor) are treated on a par. Reordering and restructuring are controlled by adjoint pairs of modalities, the grammatical analogues of Linear Logic’s ’ !’. We discuss the categorical structures for this model of syntax and the associated graphical language. We identify some empirical areas where the model leads to improved performance. In the field of natural language semantics, the compositional distributional framework of [3] and subsequent work (see [9] for an overview of results obtained so far) has achieved remarkable progress by unifying vector space models for lexical meanings with a compositional account of how these meanings combine into phrases and larger units. Interpretation takes the form of a functorial transition from Form to Meaning: a structure-preserving map that associates the operations for building syntactic structure with vector composition operations, thus assigning quantitative values to these structures. The quality of the quantitative values thus obtained is determined by the accuracy of the syntactic engine driving the interpretation process. Compositional Distributional Semantics has used type logics for that purpose: Lambek’s original Syntactic Calculus (L), and its more recent Pregroup incarnation (PG). Categorically, these are systems with a (non-symmetric) monoidal bi-closed or compact closed structure, respectively. As models of natural language syntax, these calculi are lacking in two respects: overgeneration and undergeneration. Both L and PG model the composition of phrases with an associative multiplicative tensor operation, claiming in fact that no aspect of grammatical organization beyond linear order can a↵ect