论文信息 - Learning Bias and Phonological-Rule Induction

Learning Bias and Phonological-Rule Induction

A fundamental debate in the machine learning of language has been the role of prior knowledge in the learning process. Purely nativist approaches, such as the Principles and Parameters model, build parameterized linguistic generalizations directly into the learning system. Purely empirical approaches use a general, domain-independent learning rule (Error Back-Propagation, Instance-based Generalization, Minimum Description Length) to learn linguistic generalizations directly from the data.In this paper we suggest that an alternative to the purely nativist or purely empiricist learning paradigms is to represent the prior knowledge of language as a set of abstract learning biases, which guide an empirical inductive learning algorithm. We test our idea by examining the machine learning of simple Sound Pattern of English (SPE)-style phonological rules. We represent phonological rules as finite-state transducers that accept underlying forms as input and generate surface forms as output. We show that OSTIA, a general-purpose transducer induction algorithm, was incapable of learning simple phonological rules like flapping. We then augmented OSTIA with three kinds of learning biases that are specific to natural language phonology, and that are assumed explicitly or implicitly by every theory of phonology: faithfulness (underlying segments tend to be realized similarly on the surface), community (similar segments behave similarly), and context (phonological rules need access to variable in their context). These biases are so fundamental to generative phonology that they are left implicit in many theories. But explicitly modifying the OSTIA algorithm with these biases allowed it to learn more compact, accurate, and general transducers, and our implementation successfully learns a number of rules from English and German. Furthermore, we show that some of the remaining errors in our augmented model are due to implicit biases in the traditional SPE-style rewrite system that are not similarly represented in the transducer formalism, suggesting that while transducers may be formally equivalent to SPE-style rules, they may not have identical evaluation procedures.Because our biases were applied to the learning of very simple SPE-style rules, and to a non-psychologically-motivated and nonprobabilistic theory of purely deterministic transducers, we do not expect that our model as implemented has any practical use as a phonological learning device, nor is it intended as a cognitive model of human learning. Indeed, because of the noise and nondeterminism inherent to linguistic data, we feel strongly that stochastic algorithms for language induction are much more likely to be a fruitful research direction. Our model is rather intended to suggest the kind of biases that may be added to other empiricist induction models, and the way in which they may be added, in order to build a cognitively and computationally plausible learning model for phonological rules.

Daniel Gildea | Daniel Jurafsky | Dan Jurafsky | D. Gildea

[1] Noam Chomsky,et al. Lectures on Government and Binding , 1981 .

[2] Andreas Stolcke,et al. Multiple-pronunciation lexical modeling in a speaker independent speech understanding system , 1994, ICSLP.

[3] X. LingCharles. Learning the past tense of English verbs , 1994 .

[4] Walter Daelemans,et al. The Acquisition of Stress: A Data-Oriented Approach , 1994, Comput. Linguistics.

[5] C. Douglas Johnson,et al. Formal Aspects of Phonological Description , 1972 .

[6] J. Kupiec. Hidden Markov estimation for unrestricted stochastic context-free grammars , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7] Kimmo Koskenniemi,et al. A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[8] Steven Bird,et al. One-Level Phonology: Autosegmental Representations and Rules as Finite Automata , 1994, Comput. Linguistics.

[9] Michael Riley,et al. A statistical model for generating pronunciation networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[10] Ronitt Rubinfeld,et al. Efficient learning of typical finite automata from random walks , 1993, STOC.

[11] B. Dresher,et al. A computational learning model for metrical phonology , 1990, Cognition.

[12] Stephen G. Pulman,et al. A feature-based formalism for two-level phonology: a description and implementation , 1993, Comput. Speech Lang..

[13] T. Mark Ellison,et al. Phonological Derivation in Optimality Theory , 1994, COLING.

[14] P. D. Eimas,et al. Speech Perception in Infants , 1971, Science.

[15] Andreas Stolcke,et al. Hidden Markov Model} Induction by Bayesian Model Merging , 1992, NIPS.

[16] Paul Smolensky,et al. Optimality Theory: Constraint Interaction in Generative Grammar ; CU-CS-696-93 , 1993 .

[17] L. Karttunen. Finite-state Constraints , 1993 .

[18] L MercerRobert,et al. Class-based n-gram models of natural language , 1992 .

[19] GildeaDaniel,et al. Learning bias and phonological-rule induction , 1996 .

[20] Jean Berstel,et al. Transductions and context-free languages , 1979, Teubner Studienbücher : Informatik.

[21] Mark Johnson,et al. A Discovery Procedure for Certain Phonological Rules , 1984, ACL.