Language identification using phone-based acoustic likelihoods

Applies the technique of phone-based acoustic likelihoods to the problem of language identification. The basic idea is to process the unknown speech signal by language-specific phone model sets in parallel, and to hypothesize the language associated with the model set having the highest likelihood. Using laboratory quality speech the language can be identified as French or English with better than 99% accuracy with only as little as 2 seconds of speech. On spontaneous telephone speech from the OGI corpus, the language can be identified as French or English with 82% accuracy with 10 seconds of speech. The 10 language identification rate using the OGI corpus is 59.7% with 10 seconds of signal.<<ETX>>

[1]  Marc A. Zissman,et al.  Automatic language identification using Gaussian mixture and hidden Markov models , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  F. J. Goodman,et al.  Improved automatic language identification in noisy speech , 1989, International Conference on Acoustics, Speech, and Signal Processing,.

[3]  Maxine Eskénazi,et al.  Design considerations and text selection for BREF, a large French read-speech corpus , 1990, ICSLP.

[4]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[5]  Russell B. Ives,et al.  Development of an automatic identification system of spoken languages: Phase I , 1982, ICASSP.

[6]  A. B. Poritz,et al.  Linear predictive hidden Markov models and the speech signal , 1982, ICASSP.

[7]  Jean-Luc Gauvain,et al.  A phone-based approach to non-linguistic speech feature identification , 1995, Comput. Speech Lang..

[8]  Jean-Luc Gauvain,et al.  Cross-lingual experiments with phone recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[9]  Douglas A. Reynolds,et al.  Text independent speaker identification using automatic acoustic segmentation , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[10]  Victor Zue,et al.  Automatic language identification using a segment-based approach , 1993, EUROSPEECH.

[11]  Jean-Luc Gauvain,et al.  Identifying non-linguistic speech features , 1993, EUROSPEECH.

[12]  Jean-Luc Gauvain,et al.  Identification of Non-Linguistic Speech Features , 1993, HLT.

[13]  Chin-Hui Lee,et al.  Bayesian learning for hidden Markov model with Gaussian mixture state observation densities , 1991, Speech Commun..

[14]  Jonathan G. Fiscus,et al.  Darpa Timit Acoustic-Phonetic Continuous Speech Corpus CD-ROM {TIMIT} | NIST , 1993 .

[15]  J. Foil,et al.  Language identification using noisy speech , 1986, ICASSP '86. IEEE International Conference on Acoustics, Speech, and Signal Processing.

[16]  Frank K. Soong,et al.  Continuous probabilistic acoustic map for speaker identification , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[17]  Chin-Hui Lee,et al.  Bayesian Learning of Gaussian Mixture Densities for Hidden Markov Models , 1991, HLT.

[18]  Sadaoki Furui,et al.  Concatenated phoneme models for text-variable speaker recognition , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[19]  Sadaoki Furui,et al.  Comparison of text-independent speaker recognition methods using VQ-distortion and discrete/continuous HMMs , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[20]  Naftali Z. Tisby On the application of mixture AR hidden Markov models to text independent speaker recognition , 1991, IEEE Trans. Signal Process..

[21]  Janet M. Baker,et al.  The Design for the Wall Street Journal-based CSR Corpus , 1992, HLT.

[22]  T. J. Edwards,et al.  Statistical models for automatic language identification , 1980, ICASSP.

[23]  Maxine Eskénazi,et al.  BREF, a large vocabulary spoken corpus for French , 1991, EUROSPEECH.

[24]  Seiichi Nakagawa,et al.  Speaker-independent, text-independent language identification by HMM , 1992, ICSLP.

[25]  Ronald A. Cole,et al.  A comparison of approaches to automatic language identification using telephone speech , 1993, EUROSPEECH.

[26]  M. Eskenazi,et al.  The French language database: Defining, planning, and recording a large database , 1984, ICASSP.

[27]  Aaron E. Rosenberg,et al.  Sub-word unit talker verification using hidden Markov models , 1990, International Conference on Acoustics, Speech, and Signal Processing.

[28]  A. House,et al.  Toward automatic identification of the language of an utterance. I. Preliminary methodological con , 1977 .

[29]  Ronald A. Cole,et al.  Automatic segmentation and identification of ten languages using telephone speech , 1992, ICSLP.