Gaussian Mixture Models of Between-Source Variation for Likelihood Ratio Computation from Multivariate Data

In forensic science, trace evidence found at a crime scene and on suspect has to be evaluated from the measurements performed on them, usually in the form of multivariate data (for example, several chemical compound or physical characteristics). In order to assess the strength of that evidence, the likelihood ratio framework is being increasingly adopted. Several methods have been derived in order to obtain likelihood ratios directly from univariate or multivariate data by modelling both the variation appearing between observations (or features) coming from the same source (within-source variation) and that appearing between observations coming from different sources (between-source variation). In the widely used multivariate kernel likelihood-ratio, the within-source distribution is assumed to be normally distributed and constant among different sources and the between-source variation is modelled through a kernel density function (KDF). In order to better fit the observed distribution of the between-source variation, this paper presents a different approach in which a Gaussian mixture model (GMM) is used instead of a KDF. As it will be shown, this approach provides better-calibrated likelihood ratios as measured by the log-likelihood ratio cost (Cllr) in experiments performed on freely available forensic datasets involving different trace evidences: inks, glass fragments and car paints.

[1]  Colin Aitken,et al.  Evaluation of trace evidence in the form of multivariate data , 2004 .

[2]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[3]  Douglas E. Sturim,et al.  Support vector machines using GMM supervectors for speaker verification , 2006, IEEE Signal Processing Letters.

[4]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  Aleksandr Sizov,et al.  Unifying Probabilistic Linear Discriminant Analysis Variants in Biometric Authentication , 2014, S+SSPR.

[7]  H. Akaike A new look at the statistical model identification , 1974 .

[8]  Phil Rose,et al.  Forensic voice comparison with monophthongal formant trajectories - a likelihood ratio-based discrimination of “schwa” vowel acoustics in a close social group of young Australian females , 2015, 2015 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[9]  Grzegorz Zadora,et al.  Information‐Theoretical Assessment of the Performance of Likelihood Ratio Computation Methods , 2013, Journal of forensic sciences.

[10]  Sharath Pankanti,et al.  Biometrics: a tool for information security , 2006, IEEE Transactions on Information Forensics and Security.

[11]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[12]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[13]  Ravindra K. Ahuja,et al.  A Fast Scaling Algorithm for Minimizing Separable Convex Functions Subject to Chain Constraints , 2001, Oper. Res..

[14]  V. A. Epanechnikov Non-Parametric Estimation of a Multivariate Probability Density , 1969 .

[15]  David J. Hand,et al.  Mixture Models: Inference and Applications to Clustering , 1989 .

[16]  Franco Taroni,et al.  Statistics and the Evaluation of Evidence for Forensic Scientists , 2004 .

[17]  Umar Mohammed,et al.  Probabilistic Models for Inference about Identity , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Bengt J. Borgstrom,et al.  Supervector Bayesian speaker comparison , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[19]  Agnieszka Martyna,et al.  Statistical Analysis in Forensic Science: Evidential Value of Multivariate Physicochemical Data , 2014 .

[20]  Joaquin Gonzalez-Rodriguez,et al.  Reliable support: Measuring calibration of likelihood ratios. , 2013, Forensic science international.

[21]  Doroteo Torre Toledano,et al.  Emulating DNA: Rigorous Quantification of Evidential Weight in Transparent and Testable Forensic Speaker Recognition , 2007, IEEE Transactions on Audio, Speech, and Language Processing.

[22]  I. Evett,et al.  A hierarchy of propositions: deciding which level to address in casework , 1998 .

[23]  Geoffrey J. McLachlan,et al.  Mixture models : inference and applications to clustering , 1989 .

[24]  David A. van Leeuwen,et al.  An Introduction to Application-Independent Evaluation of Speaker Recognition Systems , 2007, Speaker Classification.

[25]  Geert Molenberghs,et al.  Random Effects Models for Longitudinal Data , 2010 .

[26]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[27]  David J. Ketchen,et al.  THE APPLICATION OF CLUSTER ANALYSIS IN STRATEGIC MANAGEMENT RESEARCH: AN ANALYSIS AND CRITIQUE , 1996 .

[28]  V. Kshirsagar,et al.  Face recognition using Eigenfaces , 2011, 2011 3rd International Conference on Computer Research and Development.

[29]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[30]  Pierre Esseiva,et al.  Different likelihood ratio approaches to evaluate the strength of evidence of MDMA tablet comparisons. , 2009, Forensic science international.