Time-Frequency Cepstral Features and Combining Discriminative Training for Phonotactic Language Recognition

The performance of the phonotactic system for language recognition depends on the quality of the phone recognizers. To improve the performance of the recognizers, this paper investigates the use of new acoustic features and discriminative training techniques for phone recognizers. The commonly used features are static ceptral coefficients appended with their first and second order deltas. This configuration may be not optimal for phone recognition in phonotactic language recognition systems. In this paper, a time-frequency cepstral (TFC) feature is proposed based on our previous work in acoustic language recognition systems. The feature is extracted as follows: first a temporal discrete cosine transform (DCT) is carried out on the cepstrum matrix, and then select the transformed elements in a specific area using the variance maximization criterion. Different parameters are tested to obtain the optimal configuration. Also, we adopt the feature minimum phone error (fMPE) method for discriminative training of phone models to obtain better phone recognition results for further improvement. The effectiveness of the two techniques is demonstrated on the NIST Language Recognition Evaluation (LRE) 2007database, including the 30 second, 10 second and 3 second closed-set test conditions.

[1]  Liang He,et al.  Time–Frequency Cepstral Features and Heteroscedastic Linear Discriminant Analysis for Language Recognition , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[2]  Jean-Luc Gauvain,et al.  Context-dependent phone models and models adaptation for phonotactic language recognition , 2008, INTERSPEECH.

[3]  William M. Campbell,et al.  Phonetic Speaker Recognition with Support Vector Machines , 2003, NIPS.

[4]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[5]  William M. Campbell,et al.  Language recognition with discriminative keyword selection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Jirí Navrátil,et al.  Recent advances in phonotactic language recognition using binary-decision trees , 2006, INTERSPEECH.

[7]  Jonathan Le Roux,et al.  Discriminative Training for Large-Vocabulary Speech Recognition Using Minimum Classification Error , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[9]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[10]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  Geoffrey Zweig,et al.  fMPE: discriminatively trained features for speech recognition , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[13]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[14]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[15]  H. Ney,et al.  INTERDEPENDENCE OF LANGUAGE MODELS AND DISCRIMINATIVE TRAINING , 2007 .