Handling recordings acquired simultaneously over multiple channels with PLDA

In some speaker recognition scenarios we find conversations recorded simultaneously over multiple channels. That is the case of the interviews in the NIST SRE dataset. To take advantage of that, we propose a modification of the PLDA model that considers two different inter-session variability terms. The first term is tied between all the recordings belonging to the same conversation whereas the second is not. Thus, the former mainly intends to capture the variability due to the phonetic content of the conversation while the latter tries to capture the channel variability. We test this approach on the NIST SRE12 core condition using multiple channels per interview to enroll the speakers. The proposed approach improves the minimum DCF by 26–29 % on telephone speech and by 1–8% on interviews compared to the standard PLDA (scored by the book).

[1]  Eduardo Lleida,et al.  Handling i-vectors from different recording conditions using multi-channel simplified PLDA in speaker recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[2]  Christopher Cieri,et al.  Resources for new research directions in speaker recognition: the mixer 3, 4 and 5 corpora , 2007, INTERSPEECH.

[3]  Sridha Sridharan,et al.  Robust speaker recognition using microphone arrays , 2001, Odyssey.

[4]  Yun Lei,et al.  Towards noise-robust speaker recognition using probabilistic linear discriminant analysis , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[5]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[6]  The NIST Year 2012 Speaker Recognition Evaluation Plan 1 I , 2022 .

[7]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[8]  Ea-Ee Jan,et al.  Microphone arrays and speaker identification , 1994, IEEE Trans. Speech Audio Process..

[9]  Sergey Ioffe,et al.  Probabilistic Linear Discriminant Analysis , 2006, ECCV.

[10]  Javier Ortega-Garcia,et al.  Providing single and multi-channel acoustical robustness to speaker identification systems , 1997, 1997 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[11]  Daniel Garcia-Romero,et al.  Multicondition training of Gaussian PLDA models in i-vector space for noise and reverberation robust speaker recognition , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[12]  J. Villalba,et al.  Speaker verification and identification using Phoneme Dependent Multi-Environment Models based LInear Normalization in adverse and dynamic acoustic environments , 2005 .

[13]  Alvin F. Martin,et al.  The DET curve in assessment of detection task performance , 1997, EUROSPEECH.

[14]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[15]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[16]  John C. Platt,et al.  Speaker Identification using a Microphone Array and a Joint HMM with Speech Spectrum and Angle of Arrival , 2006, 2006 IEEE International Conference on Multimedia and Expo.