Discriminative Vector for Spoken Language Recognition

We propose a language recognition system based on discriminative vectors, in which parallel phone recognizers serve as the voice tokenization front-end followed by vector space modeling that effectively vectorizes phonotactic features, and the final classification is carried out based on the discriminative vectors. We design an ensemble of discriminative binary classifiers. The output values of these classifiers construct a discriminative vector, also referred to as output codes, to represent the high-dimensional phonotactic features. We achieve equal-error-rate of 1.95%, 3.02% and 4.9% on 1996, 2003 and 2005 NIST LRE databases, respectively, for 30-second trials.

[1]  Masaki Aono,et al.  Vector Space Models for Search and Cluster Mining , 2004 .

[2]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[3]  Koby Crammer,et al.  Improved Output Coding for Classification Using Continuous Relaxation , 2000, NIPS.

[4]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[5]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[6]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[7]  Rong Tong,et al.  Vector-based spoken language recognition using output coding , 2006, INTERSPEECH.

[8]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[9]  Tanja Schultz,et al.  Improvements in Non-Verbal Cue Identification Using Multilingual Phone Strings , 2002, Speech-to-Speech Translation@ACL.

[10]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[11]  Rong Tong,et al.  Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[12]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.