论文信息 - Advances in discriminative dependency parsing

Advances in discriminative dependency parsing

Achieving a greater understanding of natural language syntax and parsing is a critical step in producing useful natural language processing systems. In this thesis, we focus on the formalism of dependency grammar as it allows one to model important head-modifier relationships with a minimum of extraneous structure. Recent research in dependency parsing has highlighted the discriminative structured prediction framework (McDonald et al., 2005a; Carreras, 2007; Suzuki et al., 2009), which is characterized by two advantages: first, the availability of powerful discriminative learning algorithms like log-linear and max-margin models (Lafferty et al., 2001; Taskar et al., 2003), and second, the ability to use arbitrarily-defined feature representations. This thesis explores three advances in the field of discriminative dependency parsing. First, we show that the classic Matrix-Tree Theorem (Kirchhoff, 1847; Tutte, 1984) can be applied to the problem of non-projective dependency parsing, enabling both log-linear and max-margin parameter estimation in this setting. Second, we present novel third-order dependency parsing algorithms that extend the amount of context available to discriminative parsers while retaining computational complexity equivalent to existing second-order parsers. Finally, we describe a simple but effective method for augmenting the features of a dependency parser with information derived from standard clustering algorithms; our semi-supervised approach is able to deliver consistent benefits regardless of the amount of available training data. (Copies available exclusively from MIT Libraries, Rm. 14-0551, Cambridge, MA 02139-4307. Ph. 617-253-5668; Fax 617-253-1690.)

Terry Koo | Terry Koo

[1] John Cocke,et al. Programming languages and their compilers: Preliminary notes , 1969 .

[2] Qun Liu,et al. Forest-Based Translation , 2008, ACL.

[3] Corinna Cortes,et al. Support-Vector Networks , 1995, Machine Learning.

[4] Robert L. Mercer,et al. Class-Based n-gram Models of Natural Language , 1992, CL.

[5] L. Baum,et al. A Maximization Technique Occurring in the Statistical Analysis of Probabilistic Functions of Markov Chains , 1970 .

[6] L. Bregman. The relaxation method of finding the common point of convex sets and its application to the solution of problems in convex programming , 1967 .

[7] Michael Collins,et al. Discriminative Training Methods for Hidden Markov Models: Theory and Experiments with Perceptron Algorithms , 2002, EMNLP.

[8] Jason Eisner,et al. Bilexical Grammars and their Cubic-Time Parsing Algorithms , 2000 .

[9] Ben Taskar,et al. Exponentiated Gradient Algorithms for Large-margin Structured Classification , 2004, NIPS.

[10] Andrew McCallum,et al. Conditional Random Fields: Probabilistic Models for Segmenting and Labeling Sequence Data , 2001, ICML.

[11] Noah A. Smith,et al. Contrastive Estimation: Training Log-Linear Models on Unlabeled Data , 2005, ACL.