Language identification using multiple knowledge sources

Language identification experiments have been carried out on language pairs taken from seven of the languages in the OGI Multi-language Telephone Speech Corpus. This builds on previous work but introduces new techniques which are used to exploit the acoustic and phonetic differences between the languages. Subword hidden Markov models for the pair of languages are matched to unknown utterances resulting in three measures: the acoustic match, the phoneme frequencies and frequency histograms. Each of these measures gives 80 to 90% accuracy in discriminating language pairs. However these multiple knowledge sources are also combined to give improved results. Majority decision, logistic regression and a linear classifier were compared as data fusion techniques. The linear classifier performed the best giving an average accuracy of 89 to 93% on the pairs from the seven languages.

[1]  Roger C. F. Tucker,et al.  Automatic language identification using sub-word models , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Etienne Barnard,et al.  Analysis of phoneme-based features for language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Kung-Pu Li Automatic language identification using syllabic spectral features , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[4]  Michael J. Carey,et al.  Estimating linear discriminant parameters for continuous density hidden Markov models , 1994, ICSLP.

[5]  Michael J. Carey,et al.  Discriminative phonemes for speaker identification , 1994, ICSLP.

[6]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[7]  Alan Agresti,et al.  Categorical Data Analysis , 1991, International Encyclopedia of Statistical Science.

[8]  Marc A. Zissman,et al.  Automatic language identification of telephone speech messages using phoneme recognition and N-gram modeling , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[9]  M. J. Hunt,et al.  An investigation of PLP and IMELDA acoustic representations and of their potential for combination , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.