Language identification based on improved training and classification

Language identification is an automatic process of detecting the language of a speech utterance. As an application example, in automatic translation technology, before any recognition or translation, the spoken language must be recognized. In this paper we propose two methods for training and testing the language classifiers. One uses Kullback Leibler Divergence (KLD) for improved training of GMMs and the other is the use of Frame Selection Decoding (FSD) for classification. The resulting system leads to significant improvement over the baseline system. Here, acoustic features are extracted directly from speech, and in order to add temporal variations, delta and shifted delta cepstral parameters are added to the features. Our approach has led to a language identification performance of 78.6% among 11 languages using the OGI database and relative reduction error rate of 27.95% when compared with a baseline system employing GMM-UBM for classification.

[1]  R.A. Cole,et al.  Language identification with neural networks: a feasibility study , 1989, Conference Proceeding IEEE Pacific Rim Conference on Communications, Computers and Signal Processing.

[2]  M. A. Kohler,et al.  Language identification using shifted delta cepstra , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[3]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[4]  J. Hansen,et al.  Dialect Classification via Text-Independent Training and Testing for Arabic, Spanish, and Chinese , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[5]  Shiv Dutt Joshi,et al.  Robust language and speaker identification using image processing techniques combined with PCA , 2013, 2013 INTERNATIONAL CONFERENCE ON SIGNAL PROCESSING AND COMMUNICATION (ICSC).

[6]  Hui Lin,et al.  Recognition of multilingual speech in mobile applications , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[7]  Seiichi Nakagawa,et al.  Speaker-independent, text-independent language identification by HMM , 1992, ICSLP.

[8]  B.S. Atal,et al.  Automatic recognition of speakers from their voices , 1976, Proceedings of the IEEE.