论文信息 - Maximum Entropy Models and Stochastic Optimality Theory

Maximum Entropy Models and Stochastic Optimality Theory

In a series of recent publications (most notably Boersma (1998); see also Boersma and Hayes (2001)), Paul Boersma has developed a stochastic generalization of standard Optimality Theory in the sense of Prince and Smolensky (1993). While a classical OT grammar maps a set of candidates to its optimal element (or elements), in Boersma’s Stochastic Optimality Theory (StOT for short) a grammar defines a probability distribution over such a set. Boersma also developed a natural learning algorithm, the Gradual Learning Algorithm (GLA) that induces a StOT grammar from a corpus. StOT is able to cope with natural language phenomena like ambiguity, optionality, and gradient grammaticality, that are notoriously problematic for standard OT. Keller and Asudeh (2002) raise several criticisms against StOT in general and the GLA in particular. Partially as a reaction to that, Goldwater and Johnson (2003) point out that maximum entropy (ME) models, that are widely used in computational linguistics, might be an alternative to StOT. ME models are similar enough to StOT to make it possible that empirical results reached in the former model can be transferred to the latter, and these models have arguably better formal properties than StOT. On the other hand, the GLA has a higher cognitive plausibility (as can be seen from Boersma and Levelt (2000)) than the standard learning algorithms for ME models. In this paper I will argue that it is possible to combine the advantages of StOT with the ME model. It can be shown that the GLA can be adapted to ME models almost without modifications. Put differently, it turns out that the GLA is the single most natural on-line learning algorithm for ME models. Keller and Asudeh’s criticism, to the degree that it is justified, does not apply to the combination of ME evaluation with GLA learning, and the cognitive advantages of the GLA are maintained.

Gerhard Jäger | Gerhard Jäger

[1] Ash Asudeh,et al. Probabilistic Learning Algorithms and Optimality Theory , 2002, Linguistic Inquiry.

[2] Steven P. Abney. Stochastic Attribute-Value Grammars , 1996, CL.

[3] Paul Boersma,et al. Gradual constraint-ranking learning algorithm predicts acquisition order , 1999 .

[4] Mark Johnson,et al. Learning OT constraint rankings using a maximum entropy model , 2003 .

[5] P. Boersma,et al. Empirical Tests of the Gradual Learning Algorithm , 2001, Linguistic Inquiry.

[6] John D. Lafferty,et al. Inducing Features of Random Fields , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[7] Clara C. Levelt,et al. Syllable Types in Cross-linguistic and Developmental Grammars * , 1998 .

[8] P. Smolensky,et al. Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[9] Adam L. Berger,et al. A Maximum Entropy Approach to Natural Language Processing , 1996, CL.