Optimizing PLLR Features for Spoken Language Recognition

Phone Log-Likelihood Ratios (PLLR) have been recently introduced as features for spoken language and speaker recognition systems. This representation has proven to be an effective way of retrieving acoustic-phonotactic information into frame-level vectors, which can be easily plugged into state-of-the-art systems. In a previous work, we began the search of reduced representations of PLLRs, as a mean of reducing computational costs. In this paper, we extend this search, by looking for the optimal compromise between feature vector size and system performance. Results achieved by Principal Component Analysis projection on the PLLR space are extensively analyzed. Also, to evaluate the effect of using larger temporal contexts, a Shifted Delta transformation is applied (and its optimal configuration explored) on highly reduced sets of PCA-projected PLLR features, leading to further performance improvements over the best PCA-projected PLLR set.

[1]  Bin Ma,et al.  Multilingual speech recognition with language identification , 2002, INTERSPEECH.

[2]  Marcello Federico,et al.  Cross-Language Spoken Document Retrieval on the TREC SDR Collection , 2002, CLEF.

[3]  Victor Zue,et al.  Conversational interfaces: advances and challenges , 1997, Proceedings of the IEEE.

[4]  Eddie Wong,et al.  Methods to improve Gaussian mixture model based language identification system , 2002, INTERSPEECH.

[5]  Douglas E. Sturim,et al.  The MITLL NIST LRE 2009 language recognition system , 2010, 2010 IEEE International Conference on Acoustics, Speech and Signal Processing.

[6]  Karsten P. Ulland,et al.  Vii. References , 2022 .

[7]  Mireia Díez,et al.  On the use of phone log-likelihood ratios as features in spoken language recognition , 2012, 2012 IEEE Spoken Language Technology Workshop (SLT).

[8]  Mireia Díez,et al.  Dimensionality reduction of phone log-likelihood ratio features for spoken language recognition , 2013, INTERSPEECH.

[9]  Douglas A. Reynolds,et al.  Language Recognition via i-vectors and Dimensionality Reduction , 2011, INTERSPEECH.

[10]  Alvin F. Martin,et al.  The 2011 NIST Language Recognition Evaluation , 2010, INTERSPEECH.

[11]  Mikel Penagarikano,et al.  University of the Basque Country System for NIST 2010 Speaker Recognition Evaluation , 2010 .

[12]  Mireia Díez,et al.  The EHU Systems for the NIST 2011 Language Recognition Evaluation , 2012, INTERSPEECH.

[13]  Lukás Burget,et al.  Language Recognition in iVectors Space , 2011, INTERSPEECH.

[14]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[15]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[16]  A. Waibel,et al.  Multilinguality in speech and spoken language systems , 2000, Proceedings of the IEEE.

[17]  Alvin F. Martin,et al.  NIST 2003 language recognition evaluation , 2003, INTERSPEECH.

[18]  Douglas A. Reynolds,et al.  Approaches to language identification using Gaussian mixture models and shifted delta cepstral features , 2002, INTERSPEECH.