Phonotactic Language Recognition Using MLP Features

This paper describes a very efficient Parallel Phone Recognizers followed by Language Modeling (PPRLM) system in terms of both performance and processing speed. The system uses context-independent phone recognizers trained on MLP features concatenated with the conventional PLP and pitch features. MLP features have several interesting properties that make them suitable for speech processing, in particular the temporal context provided to the MLP inputs and the discriminative criterion used to learn the MLP parameters. Results of preliminary experiments conducted on the NIST LRE 2005 for the closed-set task show significant improvements obtained by the proposed system compared with a PPRLM system using context-independent phone models trained on PLP features. Moreover, the proposed system performs as well as a PPRLM system using context-dependent phone models, while running 6 times faster.

[1]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[2]  Douglas A. Reynolds,et al.  Improving phonotactic language recognition with acoustic adaptation , 2007, INTERSPEECH.

[3]  Jean-Luc Gauvain,et al.  On the Use of MLP Features for Broadcast News Transcription , 2008, TSD.

[4]  M. A. Kohler,et al.  Language identification using shifted delta cepstra , 2002, The 2002 45th Midwest Symposium on Circuits and Systems, 2002. MWSCAS-2002..

[5]  Jean-Luc Gauvain,et al.  Context-dependent phone models and models adaptation for phonotactic language recognition , 2008, INTERSPEECH.

[6]  Frantisek Grézl,et al.  Optimizing bottle-neck features for lvcsr , 2008, 2008 IEEE International Conference on Acoustics, Speech and Signal Processing.

[7]  Jean-Luc Gauvain,et al.  Improved models for Mandarin speech-to-text transcription , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[8]  Jean-Luc Gauvain,et al.  Language recognition using phone latices , 2004, INTERSPEECH.

[9]  Jean-Luc Gauvain,et al.  Language score calibration using adapted Gaussian back-end , 2009, INTERSPEECH.

[10]  Jean-Luc Gauvain,et al.  Fusing language information from diverse data sources for phonotactic language recognition , 2012, Odyssey.

[11]  Jean-Luc Gauvain,et al.  The LIMSI Broadcast News transcription system , 2002, Speech Commun..

[12]  Pavel Matejka,et al.  Towards Lower Error Rates in Phoneme Recognition , 2004, TSD.

[13]  Pavel Matejka,et al.  Phonotactic language identification using high quality phoneme recognition , 2005, INTERSPEECH.

[14]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[15]  Andreas Stolcke,et al.  Improving Language Recognition with Multilingual Phone Recognition and Speaker Adaptation Transforms , 2010, Odyssey.