Performance Improvement of Language Identification Using Transcription Based Sequential Approaches & Sequential Kernels Based SVM

In this paper a generative frontend based on both phonetic and prosodic features, and also a couple of approaches based on phonetic transcription- Aggregated Phone Recognizer followed by Language Models (APRLM) and Generalized Phone Recognizer followed by Language Models (GPRLM), are investigated. APRLM and GPRLM have few disadvantages since they need phonetic transcription of speech data, and also they use fewer level of information while the generative frontend built upon an ensemble of Gaussian densities uses prosodic and phonetic information altogether. Furthermore, no transcription of speech data is needed in Support Vector Machine (SVM)- based approaches, and they showed better performances in our experiments too. In addition, APRLM and GPRLM are more time consuming than SVM-based approaches. We used Mel-Frequency Cepstral Coefficients (MFCC) in APRLM and GPRLM, and Shifted Delta Cepstrum (SDC) and Pitch Contour Polynomial Approximation (PCPA) features in SVM-based methods. Probabilistic Sequence Kernel (PSK) and Generalized Linear Discriminant Sequence (GLDS) kernels are used in SVM experiments. SVM using GLDS and PSK kernels outperforms GMM in all our LID experiments conducted by applying PCPA features and LID performance improved about 2.1% and 5.9% respectively. The combination of Probabilistic Characteristic Vector using PCPA (PCV-PCPA) and Probabilistic Characteristic Vector using SDC (PCV-SDC) provides further improvements.

[1]  Haizhou Li,et al.  Spoken Language recognition using support vector machines with generative front-end , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  M. M. Homayounpour,et al.  Improvement of language Identification performance by Aggregated Phone Recognizer , 2009, 2009 17th European Signal Processing Conference.

[3]  Amereii S. A. Hosseini,et al.  Improvement of language identification performance using generalized phone recognizer , 2009, 2009 14th International CSI Computer Conference.

[4]  Paul Boersma,et al.  Praat: doing phonetics by computer , 2003 .

[5]  Hsiao-Chuan Wang,et al.  Language identification using pitch contour information , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[6]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[7]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[8]  M. M. Homayounpour,et al.  Using probabilistic characteristic vector based on both phonetic and prosodic features for language identification , 2010, 2010 5th International Symposium on Telecommunications.

[9]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[10]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.