Automatic language identification using discrete hidden Markov model

In the recent automatic language identification research, phonotactic approach has been studied in which all training utterances are passed through a tokenizer in order to get phonetic sequences to train the language model of different languages. The true transcription of the utterances was totally ignored. However, information in the transcription may possess important discriminating power for language identification. In this paper, we propose to use discrete hidden Markov model that takes account of the potential error patterns of the acoustic tokenizer and incorporates the transcription of the utterances in the language model training. Furthermore, with the DHMM approach, LID using multiple phonetic tokenizers can simply be considered as using a multi-dimensional features to the DHMM allowing the making of joint decision earlier in the process. A system employing this approach produces 59.00% and 68.33% accuracy on 10-sec and 45-sec speech respectively on recognizing a close set of six languages in the OGI telephone speech corpus while the phonotactic approach gives 57.00% and 77.50% identification accuracy on 10-sec and 45-sec speech when the phone recognizer uses three-state and three-mixture HMM.

[1]  Worldbet,et al.  ASCII Phonetic Symbols for the World s Languages Worldbet , 1994 .

[2]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[3]  Yonghong Yan,et al.  An approach to automatic language identification based on language-dependent phone recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[5]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[6]  Etienne Barnard,et al.  Analysis of phoneme-based features for language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Alex Acero,et al.  Spoken Language Processing: A Guide to Theory, Algorithm and System Development , 2001 .

[8]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[9]  Lori Lamel,et al.  Multilingual phone recognition of spontaneous telephone speech , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).