Bayesian Networks to Model the Variability of Speaker Verification Scores in Adverse Environments

State-of-the-art speaker recognition technology attains great performance in controlled conditions. However, when the speech segments suffer distortions like noise or reverberation performance can severely deteriorate, this fact motivated us to investigate how score distributions diverge from the ideal ones in degraded conditions. We propose a Bayesian network model that assumes that two scores exist: one observed and another one hidden. The observed score or noisy score is the one given by the speaker verification system. Meanwhile, the hidden score or clean score is the ideal score that we would obtain in a trial with high-quality speech. A set of quality measures helps to relate both scores. We applied this network to two tasks. The first one consists in rejecting unreliable trials, i.e., trials that we cannot assure whether they are target or nontarget. We prove that this method outperforms previous approaches, based on another type of Bayesian networks. The second task is to compute an improved likelihood ratio, dependent on the quality measures. This ratio improved calibration in noisy conditions.

[1]  Zhaohui Wu,et al.  Emotion-State Conversion for Speaker Recognition , 2005, ACII.

[2]  Alex Acero,et al.  Noise adaptive training using a vector taylor series approach for noise robust automatic speech recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[3]  Finnian Kelly,et al.  Effects of Long-Term Ageing on Speaker Verification , 2011, BIOID.

[4]  Liang Lu,et al.  The effect of language factors for robust speaker recognition , 2009, 2009 IEEE International Conference on Acoustics, Speech and Signal Processing.

[5]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[6]  R.M. Gray,et al.  Communication systems: An introduction to signals and noise in electrical communication , 1976, Proceedings of the IEEE.

[7]  Patrick Kenny,et al.  A Study of Interspeaker Variability in Speaker Verification , 2008, IEEE Transactions on Audio, Speech, and Language Processing.

[8]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Samy Bengio,et al.  Improving Fusion with Margin-Derived Confidence in Biometric Authentication Tasks , 2005, AVBPA.

[10]  L. Burget,et al.  Promoting robustness for speaker modeling in the community: the PRISM evaluation set , 2011 .

[11]  Jonas Richiardi,et al.  A probabilistic measure of modality reliability in speaker verification , 2005 .

[12]  Yun Lei,et al.  The role of age in factor analysis for speaker identification , 2009, INTERSPEECH.

[13]  Julian Fiérrez,et al.  Using quality measures for multilevel speaker recognition , 2006, Comput. Speech Lang..

[14]  Jonas Richiardi,et al.  Confidence and reliability measures in speaker verification , 2006, J. Frankl. Inst..

[15]  Niko Brümmer,et al.  The BOSARIS Toolkit: Theory, Algorithms and Code for Surviving the New DCF , 2013, ArXiv.

[16]  Richard M. Stern,et al.  A vector Taylor series approach for environment-independent speech recognition , 1996, 1996 IEEE International Conference on Acoustics, Speech, and Signal Processing Conference Proceedings.

[17]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[18]  Yifan Gong,et al.  A unified framework of HMM adaptation with joint compensation of additive and convolutive distortions , 2009, Computer Speech and Language.

[19]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[20]  The NIST Year 2010 Speaker Recognition Evaluation Plan 1 I NTRODUCTION , 2022 .

[21]  Jonas Richiardi,et al.  Speaker Verification with Confidence and Reliability Measures , 2006, 2006 IEEE International Conference on Acoustics Speech and Signal Processing Proceedings.

[22]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[23]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[24]  Li Deng,et al.  HMM adaptation using vector taylor series for noisy speech recognition , 2000, INTERSPEECH.

[25]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[26]  Eduardo Lleida,et al.  Analysis of speech quality measures for the task of estimating the reliability of speaker verification decisions , 2016, Speech Commun..

[27]  Bin Ma,et al.  The RSR2015: Database for Text-Dependent Speaker Verification using Multiple Pass-Phrases , 2012, Interspeech 2012.

[28]  Eduardo Lleida,et al.  Reliability Estimation of the Speaker Verification Decisions Using Bayesian Networks to Combine Information from Multiple Speech Quality Measures , 2012, IberSPEECH.

[29]  Julian Fiérrez,et al.  Analysis of the Utility of Classical and Novel Speech Quality Measures for Speaker Verification , 2009, ICB.

[30]  Samy Bengio,et al.  Confidence measures for multimodal identity verification , 2002, Inf. Fusion.

[31]  Antonio Villalba López,et al.  Advances on speaker recognition in non collaborative environments , 2014 .

[32]  Francesc Alías,et al.  DISCRIMINATING EXPRESSIVE SPEECH STYLES BY VOICE QUALITY PARAMETERIZATION , 2007 .

[33]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[34]  Yifan Gong,et al.  High-performance hmm adaptation with joint compensation of additive and convolutive distortions via Vector Taylor Series , 2007, 2007 IEEE Workshop on Automatic Speech Recognition & Understanding (ASRU).

[35]  David Pearce,et al.  The aurora experimental framework for the performance evaluation of speech recognition systems under noisy conditions , 2000, INTERSPEECH.

[36]  Oscar Saz,et al.  Experiencia del I3A en la Evaluación de Reconocimiento de Locutor NIST 2008 , 2008 .

[37]  Xi Li,et al.  Stress and Emotion Classification using Jitter and Shimmer Features , 2007, 2007 IEEE International Conference on Acoustics, Speech and Signal Processing - ICASSP '07.