GMM Based Language Identification System Using Robust Features

In this work, we propose new features for the GMM based spoken language identification system. A two stage approach is followed for extraction of the proposed new features. MFCCs and formants are extracted from huge corpus of all languages under consideration. In the first phase, MFCCs and formants are concatenated to form the feature vector. K clusters are formed from these feature vectors and one Gaussian is designed for each cluster. In the second phase, these feature vectors are evaluated against each of the K Gaussians and the returned K probabilities are considered as the elements of the proposed new feature vector, thus forming a K-element new feature vector. This proposed method for deriving new feature vector is common for both training and testing phases. In the training phase, K-element feature vectors are generated from the language specific speech corpus and language specific GMMs are trained. In testing phase, similar procedure is followed for extraction of K-element feature vector from unknown speech utterance and evaluated against language specific GMMs. Usefulness, the language specific apriori knowledge is used for further improvement of recognition performance. The experiments are carried out on OGI database and the LID performance is nearly 100%.

[1]  Ian C. Bruce,et al.  Robust formant tracking in noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[2]  Hideyuki Suzuki,et al.  A new speech recognition method based on VQ-distortion measure and HMM , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[3]  M. Sugiyama,et al.  Automatic language recognition using acoustic features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[4]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[5]  Ian C. Bruce,et al.  Robust Formant Tracking for Continuous Speech With Speaker Variability , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[6]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[7]  Jingjing Zhao,et al.  Cortical competition during language discrimination , 2008, NeuroImage.

[8]  A. Waibel,et al.  Multilinguality in speech and spoken language systems , 2000, Proceedings of the IEEE.

[9]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[10]  John S. Garofolo,et al.  NIST Speech Processing Evaluations: LVCSR, Speaker Recognition, Language Recognition , 2007 .

[11]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[12]  Katrin Kirchhoff Chapter 2 – Language Characteristics , 2006 .

[13]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.