Distributional Models for Corpus-Based Syntax Learning

The current best-performing methods in broadcoverage corpus-based syntax (tree) learning are based on linear distributional methods (Clark, 2001; Klein and Manning, 2001). In particular, they have so far greatly outperformed methods which learn (P)CFG grammars, despite the linguistic appeal of such local, recursive models. Broadly, this seems to be because decisions made by distributionalmethods are directly backed by superficially observable data, while (P)CFG learning requires careful construction of intermediate syntactic structures whose benefit to a model’s quality is only indirectly observable, if at all. In this abstract, we first describe how linear distributional methods can be applied to the induction of tree structures, discuss a nested but non-recursive model of tree structure, and finally outline two extensions of this basic model, which are currently under investigation.