Automatic Induction of Finite State Transducers for Simple Phonological Rules

This paper presents a method for learning phonological rules from sample pairs of underlying and surface forms, without negative evidence. The learned rules are represented as finite state transducers that accept underlying forms as input and generate surface forms as output. The algorithm for learning them is an extension of the OSTIA algorithm for learning general subsequential finite state transducers. Although OSTIA is capable of learning arbitrary s.f.s.t's in the limit, large dictionaries of actual English pronunciations did not give enough samples to correctly induce phonological rules. We then augmented OSTIA with two kinds of knowledge specific to natural language phonology, biases from "universal grammar". One bias is that underlying phones are often realized as phonetically similar or identical surface phones. The other biases phonological rules to apply across natural phonological classes. The additions helped in learning more compact, accurate, and general transducers than the unmodified OSTIA algorithm. An implementation of the algorithm successfully learns a number of English postlexical rules.

[1]  David S. Touretzky,et al.  Phonological Rule Induction: An Architectural Solution , 1990 .

[2]  Andreas Stolcke,et al.  Best-first Model Merging for Hidden Markov Model Induction , 1994, ArXiv.

[3]  Michael Riley,et al.  A statistical model for generating pronunciation networks , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Judith Markowitz,et al.  Review of Computational models of American speech by M. Margaret Withgott and Francine R. Chen. Center for the Study of Language and Information 1993. , 1994 .

[5]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[6]  Walter Daelemans,et al.  The Acquisition of Stress: A Data-Oriented Approach , 1994, Comput. Linguistics.

[7]  Michael J. Fischer,et al.  The String-to-String Correction Problem , 1974, JACM.

[8]  Martin Kay,et al.  Regular Models of Phonological Rule Systems , 1994, CL.

[9]  Enrique Vidal,et al.  Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[10]  L. Karttunen Finite-state Constraints , 1993 .

[11]  Francine R. Chen,et al.  Computational Models of American Speech , 1992 .

[12]  Kimmo Koskenniemi,et al.  A General Computational Model for Word-Form Recognition and Production , 1984, ACL.

[13]  C. Douglas Johnson,et al.  Formal Aspects of Phonological Description , 1972 .

[14]  Mark Johnson,et al.  A Discovery Procedure for Certain Phonological Rules , 1984, ACL.

[15]  Michael Gasser,et al.  Learning Words in Time: Towards a Modular Connectionist Account of the Acquisition of Receptive Morphology , 1993 .

[16]  C. Pollard,et al.  Center for the Study of Language and Information , 2022 .