A Phonotactic Language Model for Spoken Language Identification

We have established a phonotactic language model as the solution to spoken language identification (LID). In this framework, we define a single set of acoustic tokens to represent the acoustic activities in the world's spoken languages. A voice tokenizer converts a spoken document into a text-like document of acoustic tokens. Thus a spoken document can be represented by a count vector of acoustic tokens and token n-grams in the vector space. We apply latent semantic analysis to the vectors, in the same way that it is applied in information retrieval, in order to capture salient phonotactics present in spoken documents. The vector space modeling of spoken utterances constitutes a paradigm shift in LID technology and has proven to be very successful. It presents a 12.4% error rate reduction over one of the best reported results on the 1996 NIST Language Recognition Evaluation database.

[1]  Gerard Salton,et al.  The SMART Retrieval System , 1971 .

[2]  Ross Wilkinson,et al.  Experiments in spoken document retrieval using phoneme n-grams , 2000, Speech Commun..

[3]  Worldbet,et al.  ASCII Phonetic Symbols for the World s Languages Worldbet , 1994 .

[4]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[5]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[6]  M. Sugiyama,et al.  Automatic language recognition using acoustic features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[8]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[9]  Yonghong Yan,et al.  An approach to automatic language identification based on language-dependent phone recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[10]  Bin Ma,et al.  An acoustic segment modeling approach to automatic language identification , 2005, INTERSPEECH.

[11]  Ronald A. Cole,et al.  Perceptual benchmarks for automatic language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[12]  Bob Carpenter,et al.  Vector-based Natural Language Call Routing , 1999, Comput. Linguistics.

[13]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[14]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[15]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[16]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[18]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.