The impact in forensic voice comparison of lack of calibration and of mismatched conditions between the known-speaker recording and the relevant-population sample recordings.

In a 2017 New South Wales case, a forensic practitioner conducted a forensic voice comparison using a Gaussian mixture model - universal background model (GMM-UBM). The practitioner did not report the results of empirical tests of the performance of this system under conditions reflecting those of the case under investigation. The practitioner trained the model for the numerator of the likelihood ratio using the known-speaker recording, but trained the model for the denominator of the likelihood ratio (the UBM) using high-quality audio recordings, not recordings which reflected the conditions of the known-speaker recording. There was therefore a difference in the mismatch between the numerator model and the questioned-speaker recording versus the mismatch between the denominator model and the questioned-speaker recording. In addition, the practitioner did not calibrate the output of the system. The present paper empirically tests the performance of a replication of the practitioner's system. It also tests a system in which the UBM was trained on known-speaker-condition data and which was empirically calibrated. The performance of the former system was very poor, and the performance of the latter was substantially better.

[1]  Geoffrey Stewart Morrison,et al.  Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) - Introduction , 2016, Speech Commun..

[2]  William C. Thompson,et al.  Assessing the Admissibility of a New Generation of Forensic Voice Comparison Testimony , 2016 .

[3]  Geoffrey Stewart Morrison,et al.  A demonstration of the application of the new paradigm for the evaluation of forensic evidence under conditions reflecting those of a real forensic-voice-comparison case. , 2016, Science & justice : journal of the Forensic Science Society.

[4]  Pascal Druyts,et al.  Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions , 2000, Digit. Signal Process..

[5]  Didier Meuwly,et al.  A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation. , 2017, Forensic science international.

[6]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[7]  Geoffrey Stewart Morrison,et al.  Admissibility of forensic voice comparison testimony in England and Wales , 2018 .

[8]  Geoffrey Stewart Morrison,et al.  Tutorial on logistic-regression calibration and fusion:converting a score to a likelihood ratio , 2013, 2104.08846.

[9]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[10]  César A. Medina,et al.  Evaluation of MSR Identity Toolbox under conditions reflecting those of a real forensic case (forensic_eval_01) , 2017, Speech Commun..

[11]  Geoffrey Stewart Morrison,et al.  Distinguishing between forensic science and forensic pseudoscience: testing of validity and reliability, and approaches to forensic voice comparison. , 2014, Science & justice : journal of the Forensic Science Society.

[12]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[13]  Didier Meuwly Reconnaissance de locuteurs en sciences forensiques: l'apport d'une approche automatique , 2000 .

[14]  Philip Rose,et al.  Protocol for the collection of databases of recordings for forensic-voice-comparison research and practice , 2012 .

[15]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[16]  Doroteo Torre Toledano,et al.  Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[17]  Stan Davis,et al.  Comparison of Parametric Representations for Monosyllabic Word Recognition in Continuously Spoken Se , 1980 .

[18]  Geoffrey Stewart Morrison,et al.  Measuring the validity and reliability of forensic likelihood-ratio systems. , 2011, Science & justice : journal of the Forensic Science Society.

[19]  Larry P. Heck,et al.  MSR Identity Toolbox v1.0: A MATLAB Toolbox for Speaker Recognition Research , 2013 .

[20]  E. Lander Response to the ANZFSS council statement on the President’s Council Of Advisors On Science And Technology Report , 2017 .

[21]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[22]  Geoffrey Stewart Morrison,et al.  Forensic speech science , 2019 .

[23]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[24]  Daniel Ramos Forensic evaluation of the evidence using automatic speaker recognition systems , 2014 .

[25]  Dominique Estival,et al.  Building an Audio-Visual Corpus of Australian English: Large Corpus Collection with an Economical Portable and Replicable Black Box , 2011, INTERSPEECH.