Constraint-based Learning of Phonological Processes

Phonological processes are context-dependent sound changes in natural languages. We present an unsupervised approach to learning human-readable descriptions of phonological processes from collections of related utterances. Our approach builds upon a technique from the programming languages community called *constraint-based program synthesis*. We contribute a novel encoding of the learning problem into Boolean Satisfiability constraints, which enables both data efficiency and fast inference. We evaluate our system on textbook phonology problems and datasets from the literature, and show that it achieves high accuracy at interactive speeds.

[1]  K. J. Kohler,et al.  The sound pattern of English, Studies in Language: Noam Chomsky and Morris Halle Eds. Harper & Row, Publishers, New York 1968. 470 pp. 121 s , 1970 .

[2]  Nikolaj Bjørner,et al.  Z3: An Efficient SMT Solver , 2008, TACAS.

[3]  Bruce Hayes,et al.  A Maximum Entropy Model of Phonotactics and Phonotactic Learning , 2008, Linguistic Inquiry.

[4]  Abhinav Verma,et al.  Programmatically Interpretable Reinforcement Learning , 2018, ICML.

[5]  Armando Solar-Lezama,et al.  Unsupervised Learning by Program Synthesis , 2015, NIPS.

[6]  Jason Riggle,et al.  Information theoretic approaches to phonological structure: the case of Finnish vowel harmony , 2012 .

[7]  Rajeev Alur,et al.  Syntax-guided synthesis , 2013, 2013 Formal Methods in Computer-Aided Design.

[8]  Thomas L. Griffiths,et al.  A Rational Analysis of Rule-Based Concept Learning , 2008, Cogn. Sci..

[9]  Daniel Gildea,et al.  Learning Bias and Phonological-Rule Induction , 1996, CL.

[10]  John DeNero,et al.  Supervised Learning of Complete Morphological Paradigms , 2013, NAACL.

[11]  Noam Chomsky,et al.  The Sound Pattern of English , 1968 .

[12]  R. H. Baayen,et al.  The CELEX Lexical Database (CD-ROM) , 1996 .

[13]  Arjun Radhakrishna,et al.  Scaling Enumerative Program Synthesis via Divide and Conquer , 2017, TACAS.

[14]  Iggy Roca,et al.  A Workbook in Phonology , 1999 .

[15]  Rémi Eyraud,et al.  Learning Strictly Local Subsequential Functions , 2014, TACL.

[16]  Nikolaj Bjørner,et al.  νZ - An Optimizing SMT Solver , 2015, TACAS.

[17]  C. Gussenhoven,et al.  Understanding phonology = 音系学通解 , 2008 .

[18]  Enrique Vidal,et al.  Learning Subsequential Transducers for Pattern Recognition Interpretation Tasks , 1993, IEEE Trans. Pattern Anal. Mach. Intell..

[19]  Bruce Hayes,et al.  Modeling English Past Tense Intuitions with Minimal Generalization , 2002, SIGMORPHON.

[20]  Armando Solar-Lezama,et al.  Program sketching , 2012, International Journal on Software Tools for Technology Transfer.

[21]  P. Smolensky,et al.  Optimality Theory: Constraint Interaction in Generative Grammar , 2004 .

[22]  Y. Miyata,et al.  Harmonic grammar: A formal multi-level connectionist theory of linguistic well-formedness: Theoretic , 1990 .

[23]  Dan Gusfield,et al.  Algorithms on Strings, Trees, and Sequences - Computer Science and Computational Biology , 1997 .

[24]  P. Boersma,et al.  Empirical Tests of the Gradual Learning Algorithm , 2001, Linguistic Inquiry.

[25]  Roger Levy,et al.  Data-driven learning of symbolic constraints for a log-linear model in a phonological setting , 2016, COLING.

[26]  Paolo Papotti,et al.  Synthesizing Entity Matching Rules by Examples , 2017, Proc. VLDB Endow..

[27]  Armando Solar-Lezama,et al.  Learning to Infer Graphics Programs from Hand-Drawn Images , 2017, NeurIPS.

[28]  Timothy J. O'Donnell,et al.  A Generative Model of Phonotactics , 2017, TACL.

[29]  Roger Levy,et al.  Nonparametric Learning of Phonological Constraints in Optimality Theory , 2014, ACL.

[30]  Xiangyu Zhang,et al.  Z3str2: an efficient solver for strings, regular expressions, and length constraints , 2017, Formal Methods Syst. Des..

[31]  Mark Johnson,et al.  Learning OT constraint rankings using a maximum entropy model , 2003 .