The L 2 F Language Recognition System for NIST LRE 2011

This document presents a description of INESC-ID’s Spoken Language Systems Laboratory (L 2 F) Language Recognition systems submitted to the 2011 NIST Language Recognition evaluation. The L 2 F primary system consists of the fusion of six individual sub-systems: four phonotactic sub-systems and two acoustic based sub-systems. The major differences of the submitted LR system with respect to previous L 2 F system submitted to the NIST LRE 2009 campaign are: a) use of SVM discriminative modelling of expected phone counts extracted from lattices in contrast to generative n-gram modelling of phoneme sequences in phonotactic systems, b) development of a single kernel based system of Gaussian supervectors with support vector machine modelling, and c) incorporation of a new i-vector based system with linear generative classifiers. Additionally, two contrastive systems have been submitted. One fundamental particularity of the L 2 F submission is that a relatively small training data set was defined and used for building the several sub-systems. Thus, the “small” training data set permitted fast development and comparison of algorithms and new subsystems.

[1]  Isabel Trancoso,et al.  The L2F Broadcast News Speech Recognition System , 2010 .

[2]  Patrick Kenny,et al.  Support vector machines versus fast scoring in the low-dimensional total variability space for speaker verification , 2009, INTERSPEECH.

[3]  Carmen García-Mateo,et al.  Multi-site heterogeneous system fusions for the Albayzin 2010 Language Recognition Evaluation , 2011, 2011 IEEE Workshop on Automatic Speech Recognition & Understanding.

[4]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[5]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.

[6]  Lukás Burget,et al.  PCA-based Feature Extraction for Phonotactic Language Recognition , 2010, Odyssey.

[7]  Isabel Trancoso,et al.  The L 2 F Language Verification Systems for Albayzin-2010 Evaluation , 2010 .

[8]  Patrick Kenny,et al.  Joint Factor Analysis Versus Eigenchannels in Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Selecting phonotactic features for language recognition , 2010, INTERSPEECH.

[10]  Douglas E. Sturim,et al.  SVM Based Speaker Verification using a GMM Supervector Kernel and NAP Variability Compensation , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[11]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[12]  João Paulo da Silva Neto,et al.  The COST278 Pan-European Broadcast News Database , 2004, LREC.

[13]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[14]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[15]  Bin Ma,et al.  A Vector Space Modeling Approach to Spoken Language Identification , 2007, IEEE Transactions on Audio, Speech, and Language Processing.