Comparison of two phonetic approaches to language identification

This paper presents two unsupervised approaches to Automatic Language Identification (ALI) based on a segmental preprocessing. In the Global Segmental Model approach, the language system is modeled by a Gaussian Mixture Model (GMM) trained with automatically detected segments. In the Phonetic Differentiated Model approach, an unsupervised detection vowel/non vowel is performed and the language model is defined with two GMMs, one to model the vowel segments and a second one to model the others segments. For each approach, no labeled data are required. GMMs are initialized using an efficient data-driven variant of the LBG algorithm: the LBG-Rissanen algorithm. With 5 languages from the OGI MLTS corpus and in a closed set identification task, we reach 85 % of correct identification with each system using 45 second duration utterances for the male speakers. We increase this performance (91%) when we merge the two systems.

[1]  Yonghong Yan,et al.  Development of an approach to automatic language identification based on phone recognition , 1996, Comput. Speech Lang..

[2]  Ronald A. Cole,et al.  The OGI 22 language telephone speech corpus , 1995, EUROSPEECH.

[3]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[4]  Régine André-Obrecht,et al.  A new statistical approach for the automatic segmentation of continuous speech signals , 1988, IEEE Trans. Acoust. Speech Signal Process..

[5]  Timothy J. Hazen,et al.  Segment-based automatic language identification , 1997 .

[6]  François Pellegrino,et al.  From vocalic detection to automatic emergence of vowel systems , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Shubha Kadambe,et al.  Spontaneous speech language identification with a knowledge of linguistics , 1994, ICSLP.

[8]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[9]  Jean-Luc Gauvain,et al.  Language identification using phone-based acoustic likelihoods , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[10]  J. Rissanen A UNIVERSAL PRIOR FOR INTEGERS AND ESTIMATION BY MINIMUM DESCRIPTION LENGTH , 1983 .

[11]  Paul Dalsgaard,et al.  Language-identification based on cross-language acoustic models and optimised information combination , 1997, EUROSPEECH.

[12]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[13]  Régine André-Obrecht,et al.  Direct identification vs. correlated models to process acoustic and articulatory informations in automatic speech recognition , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.