Statistical Models in Forensic Voice Comparison

This chapter describes a number of signal-processing and statistical-modeling techniques that are commonly used to calculate likelihood ratios in human-supervised automatic approaches to forensic voice comparison. Techniques described include mel-frequency cepstral coefficients (MFCCs) feature extraction, Gaussian mixture model - universal background model (GMM-UBM) systems, i-vector - probabilistic linear discriminant analysis (i-vector PLDA) systems, deep neural network (DNN) based systems (including senone posterior i-vectors, bottleneck features, and embeddings / x-vectors), mismatch compensation, and score-to-likelihood-ratio conversion (aka calibration). Empirical validation of forensic-voice-comparison systems is also covered. The aim of the chapter is to bridge the gap between general introductions to forensic voice comparison and the highly technical automatic-speaker-recognition literature from which the signal-processing and statistical-modeling techniques are mostly drawn. Knowledge of the likelihood-ratio framework for the evaluation of forensic evidence is assumed. It is hoped that the material presented here will be of value to students of forensic voice comparison and to researchers interested in learning about statistical modeling techniques that could potentially also be applied to data from other branches of forensic science.

[1]  Patrick Kenny,et al.  Bayesian Speaker Verification with Heavy-Tailed Priors , 2010, Odyssey.

[2]  Itiel E. Dror,et al.  Minimizing Contextual Bias in Forensic Casework , 2015 .

[3]  Alan McCree,et al.  Supervised domain adaptation for I-vector based speaker recognition , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[4]  John H. L. Hansen,et al.  On the issues of intra-speaker variability and realism in speech, speaker, and language recognition tasks , 2018, Speech Commun..

[5]  S. Menard Logistic Regression: From Introductory to Advanced Concepts and Applications , 2009 .

[6]  Geoffrey Stewart Morrison,et al.  Admissibility of forensic voice comparison testimony in England and Wales , 2018 .

[7]  Geoffrey E. Hinton,et al.  Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates , 1998, Neural Networks for Signal Processing VIII. Proceedings of the 1998 IEEE Signal Processing Society Workshop (Cat. No.98TH8378).

[8]  Geoffrey Stewart Morrison,et al.  The impact in forensic voice comparison of lack of calibration and of mismatched conditions between the known-speaker recording and the relevant-population sample recordings. , 2017, Forensic science international.

[9]  R. Stoel,et al.  Forensic strength of evidence statements should preferably be likelihood ratios calculated using relevant data, quantitative measurements, and statistical models – a response to Lennard (2013) Fingerprint identification: how far have we come? , 2014, 2012.12198.

[10]  César A. Medina,et al.  Evaluation of MSR Identity Toolbox under conditions reflecting those of a real forensic case (forensic_eval_01) , 2017, Speech Commun..

[11]  Patrick Kenny,et al.  Eigenvoice modeling with sparse training data , 2005, IEEE Transactions on Speech and Audio Processing.

[12]  Ruili Wang,et al.  Speaker identification features extraction methods: A systematic review , 2017, Expert Syst. Appl..

[13]  Daniel Ramos Forensic evaluation of the evidence using automatic speaker recognition systems , 2014 .

[14]  Geoffrey Stewart Morrison,et al.  Forensic Voice Comparison , 2015 .

[15]  Sergei Vassilvitskii,et al.  k-means++: the advantages of careful seeding , 2007, SODA '07.

[16]  Geoffrey Stewart Morrison,et al.  Tutorial on logistic-regression calibration and fusion:converting a score to a likelihood ratio , 2013, 2104.08846.

[17]  Jean Baptiste Joseph Fourier,et al.  Oeuvres de Fourier: Mémoire sur la propagation de la chaleur dans les corps solides , 2013 .

[18]  Aaron Lawson,et al.  Toward Fail-Safe Speaker Recognition: Trial-Based Calibration With a Reject Option , 2019, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[19]  Ruhi Sarikaya,et al.  Bottleneck features for speaker recognition , 2012, Odyssey.

[20]  Anil Alexander,et al.  Forensic Voice Comparisons in German with Phonetic and Automatic Features Using Vocalise Software , 2014 .

[21]  David Lucy,et al.  Introduction to Statistics for Forensic Scientists , 2005 .

[22]  Ondrej Glembek Optimalizace modelování gaussovských směsí v podprostorech a jejich skórování v rozpoznávání mluvčího ; Optimization of Gaussian Mixture Subspace Models and Related Scoring Algorithms in Speaker Verification , 2012 .

[23]  Driss Matrouf,et al.  A straightforward and efficient implementation of the factor analysis model for speaker verification , 2007, INTERSPEECH.

[24]  John H. L. Hansen,et al.  Score-Aging Calibration for Speaker Verification , 2016, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[25]  F. Harris On the use of windows for harmonic analysis with the discrete Fourier transform , 1978, Proceedings of the IEEE.

[26]  Geoffrey Stewart Morrison,et al.  Forensic speech science , 2019 .

[27]  Finnian Kelly,et al.  Evaluation of VOCALISE under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01) , 2019, Speech Commun..

[28]  Douglas A. Reynolds,et al.  Speaker Verification Using Adapted Gaussian Mixture Models , 2000, Digit. Signal Process..

[29]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[30]  Geoffrey Stewart Morrison,et al.  Voice source features for forensic voice comparison - an evaluation of the GLOTTEX software package , 2012, Odyssey.

[31]  Geoffrey Stewart Morrison,et al.  Score based procedures for the calculation of forensic likelihood ratios - Scores should take account of both similarity and typicality. , 2018, Science & justice : journal of the Forensic Science Society.

[32]  Sridha Sridharan,et al.  Feature warping for robust speaker verification , 2001, Odyssey.

[33]  John H. L. Hansen,et al.  Unsupervised Speech Activity Detection Using Voicing Measures and Perceptual Spectral Flux , 2013, IEEE Signal Processing Letters.

[34]  Moez Ajili,et al.  Reliability of voice comparison for forensic applications , 2017 .

[35]  Chang Tang,et al.  Evaluation of Batvox 3.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01) , 2018, Speech Commun..

[36]  Aleksandr Sizov,et al.  Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication , 2014, S+SSPR.

[37]  David G. Stork,et al.  Pattern classification, 2nd Edition , 2000 .

[38]  Niko Brümmer,et al.  The speaker partitioning problem , 2010, Odyssey.

[39]  Peter F. Assmann,et al.  The Routledge Handbook of Phonetics , 2019 .

[40]  M. Picheny,et al.  Comparison of Parametric Representation for Monosyllabic Word Recognition in Continuously Spoken Sentences , 2017 .

[41]  Yun Lei,et al.  A novel scheme for speaker recognition using a phonetically-aware deep neural network , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[42]  R. Fisher THE USE OF MULTIPLE MEASUREMENTS IN TAXONOMIC PROBLEMS , 1936 .

[43]  Geoffrey Stewart Morrison,et al.  Multi-laboratory evaluation of forensic voice comparison systems under conditions reflecting those of a real forensic case (forensic_eval_01) - Introduction , 2016, Speech Commun..

[44]  William C. Thompson,et al.  Assessing the Admissibility of a New Generation of Forensic Voice Comparison Testimony , 2016 .

[45]  Geoffrey Stewart Morrison,et al.  A demonstration of the application of the new paradigm for the evaluation of forensic evidence under conditions reflecting those of a real forensic-voice-comparison case. , 2016, Science & justice : journal of the Forensic Science Society.

[46]  Daniel Garcia-Romero,et al.  Analysis of i-vector Length Normalization in Speaker Recognition Systems , 2011, INTERSPEECH.

[47]  Christopher P. Saunders,et al.  Building a unified statistical framework for the forensic identification of source problems , 2018 .

[48]  Patrick Kenny,et al.  Joint Factor Analysis of Speaker and Session Variability: Theory and Algorithms , 2006 .

[49]  Geoffrey E. Hinton,et al.  Split and Merge EM Algorithm for Improving Gaussian Mixture Density Estimates , 2000, J. VLSI Signal Process..

[50]  David A. van Leeuwen,et al.  Quality Measure Functions for Calibration of Speaker Recognition Systems in Various Duration Conditions , 2013, IEEE Transactions on Audio, Speech, and Language Processing.

[51]  Geoffrey Stewart Morrison,et al.  Empirical test of the performance of an acoustic-phonetic approach to forensic voice comparison under conditions similar to those of a real case. , 2017, Forensic science international.

[52]  Geoffrey Stewart Morrison,et al.  Effects of telephone transmission on the performance of formant-trajectory-based forensic voice comparison - Female voices , 2013, Speech Commun..

[53]  Christian H. Kasess,et al.  Bayesian vocal tract model estimates of nasal stops for speaker verification , 2014, 2014 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[54]  Jessica Brand,et al.  It is Now Up to the Courts: "Forensic Science in Criminal Courts: Ensuring Scientific Validity of Feature-Comparison Methods" , 2017 .

[55]  Didier Meuwly Reconnaissance de locuteurs en sciences forensiques: l'apport d'une approche automatique , 2000 .

[56]  Linzi Wilson-Wilde,et al.  The international development of forensic science standards - A review. , 2018, Forensic science international.

[57]  Ewald Enzinger A first attempt at compensating for effects due to recording-condition mismatch in formant-trajectory-based forensic voice comparison , 2014 .

[58]  John H. L. Hansen,et al.  Speaker Recognition by Machines and Humans: A tutorial review , 2015, IEEE Signal Processing Magazine.

[59]  Douglas A. Reynolds,et al.  Speaker identification and verification using Gaussian mixture speaker models , 1995, Speech Commun..

[60]  Geoffrey Stewart Morrison,et al.  Measuring the validity and reliability of forensic likelihood-ratio systems. , 2011, Science & justice : journal of the Forensic Science Society.

[61]  Sanjeev Khudanpur,et al.  X-Vectors: Robust DNN Embeddings for Speaker Recognition , 2018, 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[62]  David A. van Leeuwen,et al.  The effect of noise on modern automatic speaker recognition systems , 2012, 2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[63]  Patrick Kenny,et al.  Front-End Factor Analysis for Speaker Verification , 2011, IEEE Transactions on Audio, Speech, and Language Processing.

[64]  Sanjeev Khudanpur,et al.  Reverberation robust acoustic modeling using i-vectors with time delay neural networks , 2015, INTERSPEECH.

[65]  Yosef A. Solewicz,et al.  Evaluation of Phonexia automatic speaker recognition software under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01) , 2019, Speech Commun..

[66]  B. Found Deciphering the human condition: the rise of cognitive forensics , 2015 .

[67]  David W. Hosmer,et al.  Applied Logistic Regression , 1991 .

[68]  E. B. Newman,et al.  A Scale for the Measurement of the Psychological Magnitude Pitch , 1937 .

[69]  Gökay Dişken,et al.  A Review on Feature Extraction for Speaker Recognition under Degraded Conditions , 2017 .

[70]  Mei-Yuh Hwang,et al.  Subphonetic modeling with Markov states-Senone , 1992, [Proceedings] ICASSP-92: 1992 IEEE International Conference on Acoustics, Speech, and Signal Processing.

[71]  Yun Lei,et al.  Trial-based Calibration for Speaker Recognition in Unseen Conditions , 2014, Odyssey.

[72]  Jennifer L. Mnookin,et al.  Assessing the Admissibility of a New Generation of Forensic Voice Comparison Testimony , 2016 .

[73]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[74]  Olli Viikki,et al.  Cepstral domain segmental feature vector normalization for noise robust speech recognition , 1998, Speech Commun..

[75]  Geoffrey Stewart Morrison,et al.  Distinguishing between forensic science and forensic pseudoscience: testing of validity and reliability, and approaches to forensic voice comparison. , 2014, Science & justice : journal of the Forensic Science Society.

[76]  S. Furui,et al.  Cepstral analysis technique for automatic speaker verification , 1981 .

[77]  Sanjeev Khudanpur,et al.  A time delay neural network architecture for efficient modeling of long temporal contexts , 2015, INTERSPEECH.

[78]  David A. van Leeuwen,et al.  Quality measures based calibration with duration and noise dependency for speaker recognition , 2015, Speech Commun..

[79]  Wonyong Sung,et al.  A statistical model-based voice activity detection , 1999, IEEE Signal Processing Letters.

[80]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[81]  James H. Elder,et al.  Probabilistic Linear Discriminant Analysis for Inferences About Identity , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[82]  Alan McCree,et al.  Insights into deep neural networks for speaker recognition , 2015, INTERSPEECH.

[83]  Yosef A. Solewicz,et al.  Comparison of speaker recognition systems on a real forensic benchmark , 2012, Odyssey.

[84]  Doroteo Torre Toledano,et al.  Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[85]  Norman Poh,et al.  Avoiding overstating the strength of forensic evidence: Shrunk likelihood ratios/Bayes factors. , 2017, Science & justice : journal of the Forensic Science Society.

[86]  Alan McCree,et al.  Improving speaker recognition performance in the domain adaptation challenge using deep neural networks , 2014, 2014 IEEE Spoken Language Technology Workshop (SLT).

[87]  Pascal Druyts,et al.  Applying Logistic Regression to the Fusion of the NIST'99 1-Speaker Submissions , 2000, Digit. Signal Process..

[88]  R. Kemp,et al.  Thinking forensics: Cognitive science for forensic practitioners. , 2017, Science & justice : journal of the Forensic Science Society.

[89]  Lukás Burget,et al.  Full-covariance UBM and heavy-tailed PLDA in i-vector speaker verification , 2011, 2011 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[90]  Haizhou Li,et al.  An overview of text-independent speaker recognition: From features to supervectors , 2010, Speech Commun..

[91]  Sanjeev Khudanpur,et al.  Deep Neural Network Embeddings for Text-Independent Speaker Verification , 2017, INTERSPEECH.

[92]  Lukás Burget,et al.  Analysis of DNN approaches to speaker identification , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[93]  Laura Fernández Gallardo Human and Automatic Speaker Recognition over Telecommunication Channels , 2015 .

[94]  B. P. Bogert,et al.  The quefrency analysis of time series for echoes : cepstrum, pseudo-autocovariance, cross-cepstrum and saphe cracking , 1963 .

[95]  Colin Aitken,et al.  Evaluation of trace evidence in the form of multivariate data , 2004 .

[96]  Andrzej Drygajlo,et al.  Methodological Guidelines for Best Practice in Forensic Semiautomatic and Automatic Speaker Recognition including Guidance on the Conduct of Proficiency Testing and Collaborative Exercises , 2016 .

[97]  Timo Becker,et al.  Estimated Intra-Speaker Variability Boundaries in Forensic Speaker Recognition Casework , 2013 .

[98]  David van der Vloed Evaluation of Batvox 4.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01) , 2016, Speech Commun..

[99]  Didier Meuwly,et al.  A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation. , 2017, Forensic science international.

[100]  Kevin J. Strom,et al.  Forensic Science and the Administration of Justice: Critical Issues and Directions , 2015 .

[101]  Marcos Dipinto,et al.  Discriminant analysis , 2020, Predictive Analytics.

[102]  Sadaoki Furui,et al.  Speaker-independent isolated word recognition using dynamic features of speech spectrum , 1986, IEEE Trans. Acoust. Speech Signal Process..

[103]  Driss Matrouf,et al.  Identify the Benefits of the Different Steps in an i-Vector Based Speaker Verification System , 2013, CIARP.

[104]  Yosef A. Solewicz,et al.  Evaluation of Nuance Forensics 9.2 and 11.1 under conditions reflecting those of a real forensic voice comparison case (forensic_eval_01) , 2019, Speech Commun..

[105]  Lukás Burget,et al.  Analysis and Optimization of Bottleneck Features for Speaker Recognition , 2016, Odyssey.

[106]  Themos Stafylakis,et al.  Deep Neural Networks for extracting Baum-Welch statistics for Speaker Recognition , 2014, Odyssey.

[107]  Francesco Beritelli,et al.  The role of Voice Activity Detection in forensic speaker verification , 2011, 2011 17th International Conference on Digital Signal Processing (DSP).

[108]  Smriti Srivastava,et al.  Feature Extraction Methods for Speaker Recognition: A Review , 2017, Int. J. Pattern Recognit. Artif. Intell..

[109]  Karl Pearson F.R.S. LIII. On lines and planes of closest fit to systems of points in space , 1901 .

[110]  Michael Jessen,et al.  Experiments with Two Forensic Automatic Speaker Comparison Systems Using Reference Populations that (Mis)Match the Test Language , 2017 .