Phonotactic learning with neural language models

Computational models of phonotactics share much in common with language models, which assign probabilities to sequences of words. While state of the art language models are implemented using neural networks, phonotactic models have not followed suit. We present several neural models of phonotactics and compare their performance to a commonly employed phonotactic model. We show that these models are better able to learn longdistance dependencies, do not require stipulation of a feature system, and agree more closely with human judgements. This work provides a promising starting point for future modeling of human phonotactic knowledge.

[1]  Bruce Hayes,et al.  Explaining sonority projection effects* , 2011, Phonology.

[2]  Jason Eisner,et al.  What Constraints Should OT Allow , 2012 .

[3]  B. Hayes,et al.  Rules vs. analogy in English past tenses: a computational/experimental study , 2003, Cognition.

[4]  D. Steriade,et al.  What we know about what we have never heard: Evidence from perceptual illusions , 2007, Cognition.

[5]  William J. Idsardi,et al.  What Complexity Differences Reveal About Domains in Language , 2013, Top. Cogn. Sci..

[6]  P. Smolensky,et al.  Language universals in human brains , 2008, Proceedings of the National Academy of Sciences.

[7]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[8]  Joe Pater Generative linguistics and neural networks at 60: Foundation, friction, and fusion , 2019, Language.

[9]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[10]  Adam Albright,et al.  Natural classes are not enough: Biased generalization in novel onset clusters , 2007 .

[11]  Lawrence Phillips,et al.  The Utility of Cognitive Plausibility in Language Acquisition Modeling: Evidence From Word Segmentation , 2015, Cogn. Sci..

[12]  Gaja Jarosz,et al.  Input frequency and the acquisition of syllable structure in Polish , 2017 .

[13]  Lior Wolf,et al.  Using the Output Embedding to Improve Language Models , 2016, EACL.

[14]  Tom M. Mitchell,et al.  The Need for Biases in Learning Generalizations , 2007 .

[15]  G. Gallagher,et al.  Phonotactic knowledge and phonetically unnatural classes: the plain uvular in Cochabamba Quechua , 2019, Phonology.

[16]  Frederick Jelinek,et al.  Statistical methods for speech recognition , 1997 .

[17]  Charles D. Yang Universal Grammar, statistics or both? , 2004, Trends in Cognitive Sciences.

[18]  Eran Yahav,et al.  On the Practical Computational Power of Finite Precision RNNs for Language Recognition , 2018, ACL.

[19]  John Coleman,et al.  Stochastic phonological grammars and acceptability , 1997, SIGMORPHON@EACL.

[20]  Jeff Mielke,et al.  The Emergence of Distinctive Features , 2008 .

[21]  Catherine Ringen,et al.  Variation in Finnish Vowel Harmony: An OT Account , 1999 .

[22]  Bruce Hayes,et al.  A Maximum Entropy Model of Phonotactics and Phonotactic Learning , 2008, Linguistic Inquiry.

[23]  P. Lachenbruch Statistical Power Analysis for the Behavioral Sciences (2nd ed.) , 1989 .

[24]  Mans Hulden,et al.  Sound Analogies with Phoneme Embeddings , 2018 .

[25]  Jason Riggle,et al.  Information theoretic approaches to phonological structure: the case of Finnish vowel harmony , 2012 .

[26]  Lukás Burget,et al.  Recurrent neural network based language model , 2010, INTERSPEECH.

[27]  Adam Albright,et al.  Feature-based generalisation as a source of gradient acceptability* , 2009, Phonology.

[28]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[29]  Klinton Bicknell,et al.  Using LSTMs to Assess the Obligatoriness of Phonological Distinctive Features for Phonotactic Learning , 2019, ACL.

[30]  Charles W. Kisseberth On the Functional Unity of Phonological Rules , 2022 .

[31]  Grzegorz Chrupala,et al.  Analyzing and interpreting neural networks for NLP: A report on the first BlackboxNLP workshop , 2019, Natural Language Engineering.

[32]  Timothy J. O'Donnell,et al.  A Generative Model of Phonotactics , 2017, TACL.

[33]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[34]  Daniel Swingley,et al.  Statistical clustering and the contents of the infant vocabulary , 2005, Cognitive Psychology.

[35]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[36]  Andrew Lamont,et al.  Majority Rule in Harmonic Serialism , 2019, Proceedings of the Annual Meetings on Phonology.

[37]  Joan L. Bybee,et al.  Regular morphology and the lexicon. , 1995 .

[38]  Noam Chomsky,et al.  Some controversial questions in phonological theory , 1965, Journal of Linguistics.

[39]  Gaja Jarosz,et al.  Sonority Sequencing in Polish: the Combined Roles of Prior Bias & Experience , 2017 .

[40]  Michael S Vitevitch,et al.  A Web-based interface to calculate phonotactic probability for words and nonwords in English , 2004, Behavior research methods, instruments, & computers : a journal of the Psychonomic Society, Inc.

[41]  Kuldip K. Paliwal,et al.  Bidirectional recurrent neural networks , 1997, IEEE Trans. Signal Process..

[42]  Gaja Jarosz,et al.  Comparing Models of Phonotactics for Word Segmentation , 2014, SIGMORPHON/SIGFSM.

[43]  Yoshua Bengio,et al.  Learning long-term dependencies with gradient descent is difficult , 1994, IEEE Trans. Neural Networks.

[44]  D. Archangeli,et al.  Phonology as an emergent system , 2017 .

[45]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.