A Target-Oriented Phonotactic Front-End for Spoken Language Recognition

This paper presents a strategy to optimize the phonotactic front-end for spoken language recognition. This is achieved by selecting a subset of phones from an existing phone recognizer's phone inventory such that only the phones that best discriminate each of the target languages are selected. Each such phone subset will be used to construct a target-oriented phone tokenizer (TOPT). In this study, we examine different approaches to construct such phone tokenizers for the front-end of a parallel phone recognizers followed by vector space modeling (PPR-VSM) system. We show that the target-oriented phone tokenizers derived from language-specific phone recognizers are more effective than the original parallel phone recognizers. Our experimental results also show that the target-oriented phone tokenizers derived from universal phone recognizers achieve better performance than those derived from language-specific phone recognizers. Using the proposed target-oriented phone tokenizers as the phonotactic front-end, the language recognition system performance is significantly improved without the need for additional training samples. We achieve an equal error rate (EER) of 1.27%, 1.42% and 2.73% on the NIST 1996, 2003 and 2007 LRE databases respectively for 30-s closed-set tests. This system is one of the subsystems in IIR's submission to NIST 2007 LRE.

[1]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[2]  M. Sugiyama,et al.  Automatic language recognition using acoustic features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Joachim Köhler,et al.  Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[4]  Rong Tong,et al.  Spoken Language Recognition with Relevance Feedback , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[5]  Marcello Federico,et al.  Cross-Language Spoken Document Retrieval on the TREC SDR Collection , 2002, CLEF.

[6]  Jean-Luc Gauvain,et al.  Language identification with language-independent acoustic models , 1997, EUROSPEECH.

[7]  Ronald A. Cole,et al.  The OGI multi-language telephone speech corpus , 1992, ICSLP.

[8]  Chalapathy Neti,et al.  Towards a universal speech recognizer for multiple languages , 1997, 1997 IEEE Workshop on Automatic Speech Recognition and Understanding Proceedings.

[9]  Lukás Burget,et al.  Discriminative Training Techniques for Acoustic Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[10]  Rong Tong,et al.  Discriminative Vector for Spoken Language Recognition , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.

[11]  Etienne Barnard,et al.  Language identification of six languages based on a common set of broad phonemes , 1994, ICSLP.

[12]  J. Kohler Multi-lingual phoneme recognition exploiting acoustic-phonetic similarities of sounds , 1996, Proceeding of Fourth International Conference on Spoken Language Processing. ICSLP '96.

[13]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[14]  William M. Campbell,et al.  High-level speaker verification with support vector machines , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[15]  Frank K. Soong,et al.  A segment model based approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[16]  Rong Tong,et al.  Spoken Language Recognition Using Ensemble Classifiers , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Bin Ma,et al.  An acoustic segment modeling approach to automatic language identification , 2005, INTERSPEECH.

[18]  George Forman,et al.  An Extensive Empirical Study of Feature Selection Metrics for Text Classification , 2003, J. Mach. Learn. Res..

[19]  Ronald A. Cole,et al.  Perceptual benchmarks for automatic language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[20]  Lukás Burget,et al.  Use of Anti-Models to Further Improve State-of-the-Art PRLM Language Recognition System , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[21]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[22]  Patrick Schone,et al.  Language-reconfigurable universal phone recognition , 2003, INTERSPEECH.

[23]  William M. Campbell,et al.  Advanced Language Recognition using Cepstra and Phonotactics: MITLL System Performance on the NIST 2005 Language Recognition Evaluation , 2006, 2006 IEEE Odyssey - The Speaker and Language Recognition Workshop.

[24]  R. H. Myers,et al.  Probability and Statistics for Engineers and Scientists , 1978 .

[25]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .

[26]  Victor Zue,et al.  Recent improvements in an approach to segment-based automatic language identification , 1994, ICSLP.

[27]  William M. Campbell,et al.  Language recognition with discriminative keyword selection , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[28]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[29]  Rong Tong,et al.  Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[30]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[31]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[32]  William M. Campbell,et al.  Support vector machines for speaker and language recognition , 2006, Comput. Speech Lang..

[33]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[34]  Haizhou Li,et al.  On Acoustic Diversification Front-End for Spoken Language Identification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[35]  Rong Tong,et al.  NIST 2007 Language Recognition Evaluation: From the Perspective of IIR , 2008, PACLIC.

[36]  Jirí Navrátil,et al.  Spoken language recognition-a step toward multilinguality in speech processing , 2001, IEEE Trans. Speech Audio Process..

[37]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[38]  Pietro Laface,et al.  Compensation of Nuisance Factors for Speaker and Language Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[39]  Ronald A. Cole,et al.  The OGI 22 language telephone speech corpus , 1995, EUROSPEECH.

[40]  Rong Tong,et al.  Target-oriented phone tokenizers for spoken language recognition , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[41]  Ronald Rosenfeld,et al.  Adaptive Statistical Language Modeling; A Maximum Entropy Approach , 1994 .

[42]  Yonghong Yan,et al.  An approach to automatic language identification based on language-dependent phone recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[43]  Etienne Barnard,et al.  Analysis of phoneme-based features for language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[44]  Jeff A. Bilmes,et al.  Mixed-memory Markov models for Automatic Language Identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[45]  Hsiao-Chuan Wang,et al.  MAT - A Project to Collect Mandarin Speech Data Through Telephone Net works in Taiwan , 1997, Int. J. Comput. Linguistics Chin. Lang. Process..

[46]  Mary P. Harper,et al.  Spoken Language Characterization , 2008 .

[47]  Rong Tong,et al.  Target-oriented phone selection from universal phone set for spoken language recognition , 2008, INTERSPEECH.

[48]  Worldbet,et al.  ASCII Phonetic Symbols for the World s Languages Worldbet , 1994 .

[49]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[50]  A. Waibel,et al.  Multilinguality in speech and spoken language systems , 2000, Proceedings of the IEEE.