GMM based language identification system using robust features

In this work, we have proposed new feature vectors for spoken language identification (LID) system. The Mel frequency cepstral coefficients (MFCC) and formant frequencies derived using short-time window speech signal. Formant frequencies are extracted from linear prediction (LP) analysis of speech signal. Using these two kind of features of speech signal, new feature vectors are derived using cluster based computation. A GMM based classifier has been designed using these new feature vectors. The language specific apriori knowledge is applied on the recognition output. The experiments are carried out on OGI database and LID recognition performance is improved.

[1]  Ian C. Bruce,et al.  Robust Formant Tracking for Continuous Speech With Speaker Variability , 2006, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Timothy J. Hazen,et al.  Retrieval and browsing of spoken content , 2008, IEEE Signal Processing Magazine.

[3]  Hideyuki Suzuki,et al.  A new speech recognition method based on VQ-distortion measure and HMM , 1993, 1993 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[4]  Jingjing Zhao,et al.  Cortical competition during language discrimination , 2008, NeuroImage.

[5]  A. Waibel,et al.  Multilinguality in speech and spoken language systems , 2000, Proceedings of the IEEE.

[6]  Hema A. Murthy,et al.  Language identification using parallel syllable-like unit recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[7]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[8]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[9]  B. Yegnanarayana Formant extraction from linear‐prediction phase spectra , 1978 .

[10]  Tanja Schultz,et al.  Language-independent and language-adaptive acoustic modeling for speech recognition , 2001, Speech Commun..

[11]  Ian C. Bruce,et al.  Robust formant tracking in noise , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[12]  Hema A. Murthy,et al.  Automatic segmentation of continuous speech using minimum phase group delay functions , 2004, Speech Commun..

[13]  Y.K. Muthusamy,et al.  Reviewing automatic language identification , 1994, IEEE Signal Processing Magazine.

[14]  Russell B. Ives,et al.  Development of an automatic identification system of spoken languages: Phase I , 1982, ICASSP.

[15]  John S. Garofolo,et al.  NIST Speech Processing Evaluations: LVCSR, Speaker Recognition, Language Recognition , 2007 .

[16]  M. Sugiyama,et al.  Automatic language recognition using acoustic features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[17]  Katrin Kirchhoff Chapter 2 – Language Characteristics , 2006 .

[18]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.