Phonotactic Modeling of Extremely Low Resource Languages

This paper presents a novel approach to low resource language modeling. Here we propose a model for word prediction which is based on multi-variant ngram abstraction with weighted confidence level. We demonstrate a significant improvement in word recall over ”traditional” KneserNey back-off model for most of the examined low resource languages.

[1]  Wonyong Sung,et al.  Character-level language modeling with hierarchical recurrent neural networks , 2016, 2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[2]  Hermann Ney,et al.  Performance analysis of Neural Networks in combination with n-gram language models , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  A. Friederici,et al.  Phonotactic knowledge of word boundaries and its use in infant speech perception , 1993, Perception & psychophysics.

[4]  Wim Zonneveld,et al.  Learning phonotactic distributions , 2004 .

[5]  C. Fisher,et al.  Learning phonotactic constraints from brief auditory experience , 2002, Cognition.

[6]  P. Luce,et al.  Phonotactics, density, and entropy in spoken word recognition , 2001 .

[7]  Hermann Ney,et al.  From Feedforward to Recurrent LSTM Neural Networks for Language Modeling , 2015, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[8]  Bruce Tesar,et al.  A revised version of this paper will appear in the proceedings of CLS 39. Using phonotactics to learn phonological alternations , 1994 .

[9]  Matthew Goldrick,et al.  Phonological features and phonotactic constraints in speech production , 2004 .

[10]  Jeffrey Heinz,et al.  Inductive learning of phonotactic patterns , 2007 .

[11]  T. A. Cartwright,et al.  Distributional regularity and phonotactic constraints are useful for segmentation , 1996, Cognition.

[12]  F ChenStanley,et al.  An Empirical Study of Smoothing Techniques for Language Modeling , 1996, ACL.

[13]  R. Kager,et al.  Adding Generalization to Statistical Learning: The Induction of Phonotactics from Continuous Speech. , 2010 .

[14]  Philip James Hamilton,et al.  Phonetic constraints and markedness in the phonotactics of Australian Aboriginal languages , 1996 .

[15]  Alexey Karpov,et al.  A Comparison of RNN LM and FLM for Russian Speech Recognition , 2015, SPECOM.

[16]  P. Luce,et al.  Increases in phonotactic probability facilitate spoken nonword repetition. , 2005 .

[17]  Kristine H. Onishi,et al.  Infants learn phonotactic regularities from brief auditory experience , 2003, Cognition.

[18]  J. McQueen Segmentation of Continuous Speech Using Phonotactics , 1998 .

[19]  G. Dell,et al.  Speech errors, phonotactic constraints, and implicit learning: a study of the role of experience in language production. , 2000, Journal of experimental psychology. Learning, memory, and cognition.

[20]  Harold Koch,et al.  The Languages and Linguistics of Australia A Comprehensive Guide , 2014 .

[21]  M. Beckman,et al.  The interaction between vocabulary size and phonotactic probability effects on children's production accuracy and fluency in nonword repetition. , 2004, Journal of speech, language, and hearing research : JSLHR.

[22]  Alexander M. Rush,et al.  Character-Aware Neural Language Models , 2015, AAAI.

[23]  Hermann Ney,et al.  LSTM Neural Networks for Language Modeling , 2012, INTERSPEECH.

[24]  Andrew Butcher,et al.  VC vs. CV syllables: a comparison of Aboriginal languages with English , 2004, Journal of the International Phonetic Association.

[25]  Nicholas Thieberger Natrauswen nig Efat: Stories from South Efate , 2013 .

[26]  Juliette Blevins The Syllable in Optimality Theory: The Independent Nature of Phonotactic Constraints: An Alternative to Syllable-Based Approaches , 2003 .

[27]  Gary S Dell,et al.  Speech errors reflect newly learned phonotactic constraints. , 2006, Journal of experimental psychology. Learning, memory, and cognition.