A Vector Space Modeling Approach to Spoken Language Identification

We propose a novel approach to automatic spoken language identification (LID) based on vector space modeling (VSM). It is assumed that the overall sound characteristics of all spoken languages can be covered by a universal collection of acoustic units, which can be characterized by the acoustic segment models (ASMs). A spoken utterance is then decoded into a sequence of ASM units. The ASM framework furthers the idea of language-independent phone models for LID by introducing an unsupervised learning procedure to circumvent the need for phonetic transcription. Analogous to representing a text document as a term vector, we convert a spoken utterance into a feature vector with its attributes representing the co-occurrence statistics of the acoustic units. As such, we can build a vector space classifier for LID. The proposed VSM approach leads to a discriminative classifier backend, which is demonstrated to give superior performance over likelihood-based n-gram language modeling (LM) backend for long utterances. We evaluated the proposed VSM framework on 1996 and 2003 NIST Language Recognition Evaluation (LRE) databases, achieving an equal error rate (EER) of 2.75% and 4.02% in the 1996 and 2003 LRE 30-s tasks, respectively, which represents one of the best results reported on these popular tasks

[1]  Jean-Luc Gauvain,et al.  Language identification with language-independent acoustic models , 1997, EUROSPEECH.

[2]  Yonghong Yan,et al.  An approach to automatic language identification based on language-dependent phone recognition , 1995, 1995 International Conference on Acoustics, Speech, and Signal Processing.

[3]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[4]  Etienne Barnard,et al.  Analysis of phoneme-based features for language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Jeff A. Bilmes,et al.  Mixed-memory Markov models for Automatic Language Identification , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[6]  Hynek Hermansky,et al.  Segmentation of speech for speaker and language recognition , 2003, INTERSPEECH.

[7]  Etienne Barnard,et al.  Language identification of six languages based on a common set of broad phonemes , 1994, ICSLP.

[8]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[9]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[10]  Douglas A. Reynolds,et al.  Language identification using Gaussian mixture model tokenization , 2002, 2002 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Bin Ma,et al.  A Phonotactic Language Model for Spoken Language Identification , 2005, ACL.

[12]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[13]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[14]  R. Lippmann,et al.  An introduction to computing with neural nets , 1987, IEEE ASSP Magazine.

[15]  Marko Grobelnik,et al.  Feature selection using linear classifier weights: interaction with classification models , 2004, SIGIR '04.

[16]  William M. Campbell,et al.  Language recognition with support vector machines , 2004, Odyssey.

[17]  L. Lamel,et al.  Large-vocabulary continuous speech recognition: advances and applications , 2000, Proceedings of the IEEE.

[18]  Lawrence R. Rabiner,et al.  A tutorial on hidden Markov models and selected applications in speech recognition , 1989, Proc. IEEE.

[19]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[20]  William M. Campbell,et al.  Acoustic, phonetic, and discriminative approaches to automatic language identification , 2003, INTERSPEECH.

[21]  Roberto Basili,et al.  Learning to Classify Text Using Support Vector Machines: Methods, Theory, and Algorithms by Thorsten Joachims , 2003, Comput. Linguistics.

[22]  Douglas A. Reynolds,et al.  Robust text-independent speaker identification using Gaussian mixture speaker models , 1995, IEEE Trans. Speech Audio Process..

[23]  Rong Tong,et al.  Integrating Acoustic, Prosodic and Phonotactic Features for Spoken Language Identification , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[24]  Lori Lamel,et al.  Phonetic knowledge, phonotactics and perceptual validation for automatic language identification , 2003 .

[25]  Chin-Hui Lee,et al.  A MFoM learning approach to robust multiclass multi-label text categorization , 2004, ICML.

[26]  Frank K. Soong,et al.  A segment model based approach to speech recognition , 1988, ICASSP-88., International Conference on Acoustics, Speech, and Signal Processing.

[27]  Hema A. Murthy,et al.  Language identification using parallel syllable-like unit recognition , 2004, 2004 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[28]  S. Hyakin,et al.  Neural Networks: A Comprehensive Foundation , 1994 .

[29]  Bin Ma,et al.  Using local & global phonotactic features in Chinese dialect identification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[30]  C.-H. Lee,et al.  From knowledge-ignorant to knowledge-rich modeling : a new speech research parading for next generation automatic speech recognition , 2004 .

[31]  Chin-Hui Lee,et al.  Discriminative training of natural language call routers , 2003, IEEE Trans. Speech Audio Process..

[32]  Jont B. Allen,et al.  How do humans process and recognize speech? , 1993, IEEE Trans. Speech Audio Process..

[33]  Ronald A. Cole,et al.  Perceptual benchmarks for automatic language identification , 1994, Proceedings of ICASSP '94. IEEE International Conference on Acoustics, Speech and Signal Processing.

[34]  Bob Carpenter,et al.  Vector-based Natural Language Call Routing , 1999, Comput. Linguistics.

[35]  Geoffrey E. Hinton,et al.  Learning representations of back-propagation errors , 1986 .

[36]  Bin Ma,et al.  An acoustic segment modeling approach to automatic language identification , 2005, INTERSPEECH.

[37]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[38]  Bin Ma,et al.  A text categorization approach to automatic language identification , 2005, INTERSPEECH.

[39]  M. Sugiyama,et al.  Automatic language recognition using acoustic features , 1991, [Proceedings] ICASSP 91: 1991 International Conference on Acoustics, Speech, and Signal Processing.

[40]  V. Ramasubramanian,et al.  Language identification using parallel sub-word recognition , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[41]  Katrin Kirchhoff,et al.  Multi-stream language identification using data-driven dependency selection , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[42]  Claude E. Shannon,et al.  Prediction and Entropy of Printed English , 1951 .

[43]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[44]  Victor Zue,et al.  Automatic language identification using a segment-based approach , 1993, EUROSPEECH.

[45]  Jean-Luc Gauvain,et al.  Language identification incorporating lexical information , 1998, ICSLP.

[46]  Worldbet,et al.  ASCII Phonetic Symbols for the World s Languages Worldbet , 1994 .

[47]  Ian T. Nabney,et al.  Netlab: Algorithms for Pattern Recognition , 2002 .

[48]  Yuen Ren Chao,et al.  Human Behavior and the Principle of Least Effort: An Introduction to Human Ecology , 1950 .

[49]  J.R. Bellegarda,et al.  Exploiting latent semantic information in statistical language modeling , 2000, Proceedings of the IEEE.

[50]  Bin Ma,et al.  Multilingual speech recognition with language identification , 2002, INTERSPEECH.

[51]  A. Waibel,et al.  Multilinguality in speech and spoken language systems , 2000, Proceedings of the IEEE.

[52]  Victor Zue,et al.  Recent improvements in an approach to segment-based automatic language identification , 1994, ICSLP.

[53]  Gerhard Rigoll,et al.  A Novel Feature Combination Approach for Spoken Document Classification with Support Vector Machines , 2003 .