论文信息 - Target-oriented phone selection from universal phone set for spoken language recognition

Target-oriented phone selection from universal phone set for spoken language recognition

This paper studies target-oriented phone selection strategy for constructing phone tokenizers in the Parallel Phone Recognizers followed by Vector Space Model (PPR-VSM) paradigm of spoken language recognition. With this phone selection strategy, one derives a set of target-oriented phone tokenizers (TOPT), each having a subset of phones that have high discriminative ability for a target language. Two phone selection methods are proposed to derive such phone subsets from a phone recognizer. We show that the TOPTs derived from a universal phone recognizer (UPR) outperform those derived from language specific phone recognizers. The TOPT front-end derived from a UPR also consistently outperforms the UPR front-end without involving additional acoustic modeling. We achieve an equal error rates (EERs) of 1.33%, 1.75% and 2.80% on NIST 1996, 2003 and 2007 LRE databases respectively for 30 second closed-set tests by including multiple TOPTs in the PPR.

Rong Tong | Bin Ma | Haizhou Li | Chng Eng Siong

[1] Marc A. Zissman,et al. Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[2] James L. Hieronymus. ASCII Phonetic Symbols for the World''s Languages: Worldbet , 1993 .

[3] Patrick Schone,et al. Language-reconfigurable universal phone recognition , 2003, INTERSPEECH.

[4] Ronald A. Cole,et al. The OGI 22 language telephone speech corpus , 1995, EUROSPEECH.

[5] Rong Tong,et al. Target-oriented phone tokenizers for spoken language recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6] Rong Tong,et al. Spoken Language Recognition Using Ensemble Classifiers , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[7] Douglas A. Reynolds,et al. Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[8] Ronald A. Cole,et al. The OGI multi-language telephone speech corpus , 1992, ICSLP.

[9] Pietro Laface,et al. Compensation of Nuisance Factors for Speaker and Language Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[10] Rong Tong,et al. Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11] Bin Ma,et al. A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[12] William M. Campbell,et al. Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[13] Ronald A. Cole,et al. Perceptual benchmarks for automatic language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.