Evaluation of Nuance Forensics 9.2 and 11.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01)

Abstract Two automatic speaker recognition systems, Nuance Forensics 9.2 and 11.1, were tested within the setting of the Speech Communication virtual special issue “Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01).” Nuance Forensics 9.2 is an i-vector PLDA system and Nuance Forensics 11.1 combines i-vector PLDA technology with some Deep Neural Networks functionalities. Both systems were tested in three variants. The difference between the first and second variant lies in the size of the “Reference Population” (42 vs. 105 speakers) and the difference between the first two and the third variant lies in the use of the “Background Model”, either working with a system default (first two variants) or a dedicated model drawn from the forensic_eval_01 training data (third variant). The Reference Population is used for the purpose of calibration (arriving at calibrated likelihood ratios from voice comparison scores); the Background Model is used for normalising the scores (Adaptive S-norm). Comparing the three variants, it was shown across the two systems that the inclusion of a Background Model that is dedicated to the conditions of the case leads to improved performance over the use of a system default. The difference in the size of the Reference Population however did not matter. Comparing the two systems, it was found that the system that includes Deep Neural Network technology leads to improved results over the use of a pure i-vector PLDA system.

[1]  David van der Vloed Evaluation of Batvox 4.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01) , 2016, Speech Commun..

[2]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[3]  Geoffrey Stewart Morrison,et al.  Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) - Introduction , 2016, Speech Commun..

[4]  Didier Meuwly,et al.  A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation. , 2017, Forensic science international.

[5]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[6]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7]  H Hermansky,et al.  Perceptual linear predictive (PLP) analysis of speech. , 1990, The Journal of the Acoustical Society of America.

[8]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[9]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[10]  Pietro Laface,et al.  Compensation of Nuisance Factors for Speaker and Language Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[11]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[12]  Pietro Laface,et al.  Pairwise Discriminative Speaker Verification in the ${\rm I}$-Vector Space , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[13]  Chang Tang,et al.  Evaluation of Batvox 3.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01) , 2018, Speech Commun..

[14]  Douglas E. Sturim,et al.  Speaker adaptive cohort selection for Tnorm in text-independent speaker verification , 2005, Proceedings. (ICASSP '05). IEEE International Conference on Acoustics, Speech, and Signal Processing, 2005..

[15]  James R. Glass,et al.  Unsupervised Speaker Adaptation based on the Cosine Similarity for Text-Independent Speaker Verification , 2010, Odyssey.