Spoken Language Recognition -a Step towards Multilinguality in Speech Processing

| In recent years, automatic recognition of spoken languages has become an important feature in a variety of speech-enabled multilingual applications which, besides accuracy, also demand for eecient and \linguistically scal-able" algorithms. This paper deals with a particularly successful approach based on phonotactic-acoustic features and presents systems for language identiication as well as for unknown-language rejection. An architecture with multi-path decoding, improved phonotactic models using binary-tree structures, and acoustic pronunciation models serve as a framework for experiments and discussion on these two tasks. In particular, language identiication results on a telephone-speech task (NIST'95 evaluation) in six and nine languages are presented together with results from a perceptual experiment carried out with human listeners. The performance of language rejection based on phonotactic modeling combined with a monolingual LVCSR system in the domain of broadcast news transcription is also reported. Besides yielding state-of-the-art performance, the described systems are computationally inexpensive and easily extensible (scalable) to new languages without the need for linguistic experts.

[1]  Lalit R. Bahl,et al.  A tree-based statistical language model for natural language speech recognition , 1989, IEEE Trans. Acoust. Speech Signal Process..

[2]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[3]  Ronald A. Cole,et al.  Perceptual benchmarks for automatic language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Marc A. Zissman,et al.  Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Itahashi Shuichi,et al.  Language identification based on speech fundamental frequency , 1995, EUROSPEECH.

[6]  Yonghong Yan,et al.  An approach to automatic language identification based on language-dependent phone recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[7]  Ronald A. Cole,et al.  The OGI 22 language telephone speech corpus , 1995, EUROSPEECH.

[8]  Tanja Schultz,et al.  LVCSR-based language identification , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[9]  Yonghong Yan,et al.  Development of an approach to language identification based on language-dependent phone recognition , 1996 .

[10]  Yonghong Yan,et al.  Development of an approach to automatic language identification based on phone recognition , 1996, Comput. Speech Lang..

[11]  Jirí Navrátil,et al.  Phonetic-context mapping in language identification , 1997, EUROSPEECH.

[12]  Jirí Navrátil,et al.  Double bigram-decoding in phonotactic language identification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[13]  Michael I. Savic,et al.  Random walk theory applied to language identification , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[14]  Shubha Kadambe,et al.  Robust spoken language identification using large vocabulary speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Jean-Luc Gauvain,et al.  Language identification incorporating lexical information , 1998, ICSLP.

[16]  Isabel Trancoso,et al.  Identification of spoken European languages , 1998, 9th European Signal Processing Conference (EUSIPCO 1998).

[17]  Peder A. Olsen,et al.  Transcription of broadcast news-some recent improvements to IBM's LVCSR system , 1998, Proceedings of the 1998 IEEE International Conference on Acoustics, Speech and Signal Processing, ICASSP '98 (Cat. No.98CH36181).

[18]  TutorialYeshwant K. Muthusamyy,et al.  Automatic Language Identiication: a Review/tutorial , 2022 .