The L 2 F Language Verification Systems for Albayzin-2010 Evaluation

This paper presents a description of INESC-ID’s Spoken Language Systems Laboratory (L 2 F) Language Verification systems submitted to the ALBAYZIN-2010 evaluation. The primary submission consists of the fusion of six individual subsystems: one Gaussian supervector approach with support vector machines that relies on the acoustic characteristics extracted by a front-end of shifted deltas, and five individual Phone Recognition and Language Modeling detectors based on five different phone tokenizers. Additionally, two contrastive systems have been developed. Language detection results have been submitted for all the evaluation conditions for every system. The main particularity of the systems developed for this evaluation is that individual language models for clean and noisy conditions have been trained for each target language. Results for the different systems and evaluation conditions are reported.

[1]  João Paulo da Silva Neto,et al.  Audio segmentation, classification and clustering in a broadcast news task , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[2]  J. Neto,et al.  The L 2 F Broadcast News Speech Recognition System , 2010 .

[3]  João Paulo da Silva Neto,et al.  Incorporating acoustical modelling of phone transitions in an hybrid ANN/HMM speech recognizer , 2008, INTERSPEECH.

[4]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[5]  Marc A. Zissman,et al.  Comparison of : Four Approaches to Automatic Language Identification of Telephone Speech , 2004 .

[6]  João Paulo da Silva Neto,et al.  AUDIMUS.MEDIA: A Broadcast News Speech Recognition System for the European Portuguese Language , 2003, PROPOR.

[7]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[8]  Mireia Díez,et al.  The Albayzin 2010 Language Recognition Evaluation , 2011, INTERSPEECH.

[9]  Hynek Hermansky,et al.  RASTA processing of speech , 1994, IEEE Trans. Speech Audio Process..

[10]  Andreas Stolcke,et al.  SRILM - an extensible language modeling toolkit , 2002, INTERSPEECH.

[11]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[12]  Isabel Trancoso,et al.  The L2F Broadcast News Speech Recognition System , 2010 .