Likelihood ratio estimation in forensic identification using similarity and rarity

Forensic identification is the task of determining whether or not observed evidence arose from a known source. It is useful to associate probabilities with identification/exclusion opinions, either for presentation in court or to evaluate the discriminative power of a given set of attributes. At present, in most forensic domains outside of DNA evidence, it is not possible to make such a statement since the necessary probability distributions cannot be computed with reasonable accuracy, although the probabilistic approach itself is well-understood. In principle, it involves determining a likelihood ratio (LR) - the ratio of the joint probability of the evidence and source under the identification hypothesis (that the evidence came from the source) and under the exclusion hypothesis (that the evidence did not arise from the source). Evaluating the joint probability is computationally intractable when the number of variables is even moderately large. It is also statistically infeasible since the number of parameters to be determined from the data is exponential with the number of variables. An approximate method is to replace the joint probability by another probability: that of distance (or similarity) between evidence and object under the two hypotheses. While this reduces to linear complexity with the number of variables, it is an oversimplification leading to errors. We consider a third method which decomposes the LR into a product of two factors, one based on distance and the other on rarity. This result, which is exact for the univariate Gaussian case, has an intuitive appeal - forensic examiners assign higher importance to rare feature values in the evidence and low importance to common feature values. We generalize this approach to more complex data such as vectors and graphs, which makes LR estimation computationally tractable. Empirical evaluations of the three methods, done with several data types (continuous features, binary features, multinomial and graph) and several modalities (handwriting with binary features, handwriting with multinomial features and footwear impressions with continuous features), show that the distance and rarity method is significantly better than the distance only method. HighlightsWe formulate probability of identification in terms of likelihood ratio (LR).We factor LR into the product of two factors, based on distance and rarity.We describe methods for computing distance in forensics for different data types.We provide empirical evaluations of three methods of LR computation.Proposed method outperforms distance method with forensic data.

[1]  David J. Balding,et al.  Bayesian Networks and Probabilistic Inference in Forensic Science , 2011 .

[2]  Sargur N. Srihari,et al.  Similarity and Clustering of Footwear Prints , 2010, 2010 IEEE International Conference on Granular Computing.

[3]  Sargur N. Srihari Evaluating the Rarity of Handwriting Formations , 2011, 2011 International Conference on Document Analysis and Recognition.

[4]  Dekang Lin,et al.  An Information-Theoretic Definition of Similarity , 1998, ICML.

[5]  Stuart Barber,et al.  All of Statistics: a Concise Course in Statistical Inference , 2005 .

[6]  Sargur N. Srihari,et al.  Document Title: Computational Methods for Handwritten Questioned Document Examination , 2010 .

[7]  J. Epstein Latent Print Examination and Human Factors: Improving the Practice through a Systems Approach, The Report of the Expert Working Group on Human Factors in Latent Print Analysis , 2012 .

[8]  Franco Taroni,et al.  Statistics and the Evaluation of Evidence for Forensic Scientists , 2004 .

[9]  A Richardson,et al.  The evidential value of the comparison of paint flakes from sources other than vehicles. , 1968, Journal - Forensic Science Society.

[10]  Colin Aitken,et al.  Evaluation of trace evidence in the form of multivariate data , 2004 .

[11]  Pierre Esseiva,et al.  Different likelihood ratio approaches to evaluate the strength of evidence of MDMA tablet comparisons. , 2009, Forensic science international.

[12]  R. Muehlberger,et al.  A Statistical Examination of Selected Handwriting Characteristics , 1977 .

[13]  Sargur N. Srihari,et al.  Binary Vector Dissimilarity Measures for Handwriting Identification , 2003, IS&T/SPIE Electronic Imaging.

[14]  Ian W. Evett,et al.  A Bayesian approach to interpreting footwear marks in forensic casework , 1998 .

[15]  Christophe Champod,et al.  Computation of Likelihood Ratios in Fingerprint Identification for Configurations of Any Number of Minutiæ , 2007, Journal of forensic sciences.

[16]  Niko Brümmer,et al.  Application-independent evaluation of speaker detection , 2006, Comput. Speech Lang..

[17]  King-Sun Fu,et al.  A distance measure between attributed relational graphs for pattern recognition , 1983, IEEE Transactions on Systems, Man, and Cybernetics.

[18]  Vipin Kumar,et al.  Similarity Measures for Categorical Data: A Comparative Evaluation , 2008, SDM.

[19]  Nir Friedman,et al.  Probabilistic Graphical Models - Principles and Techniques , 2009 .

[20]  G. Casella,et al.  Statistical Inference , 2003, Encyclopedia of Social Network Analysis and Mining.

[21]  Horst Bunke,et al.  A graph distance metric based on the maximal common subgraph , 1998, Pattern Recognit. Lett..

[22]  Harish Srinivasan,et al.  On the Discriminability of the Handwriting of Twins , 2008, Journal of forensic sciences.

[23]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[24]  Bernhard Schölkopf,et al.  The Kernel Trick for Distances , 2000, NIPS.

[25]  C. Dodson,et al.  On the distributions of mass, thickness and density in paper. , 2001 .

[26]  Horst Bunke,et al.  Graph Matching - Challenges and Potential Solutions , 2005, ICIAP.

[27]  Sargur N. Srihari,et al.  Comparison of ROC and Likelihood Decision Methods in Automatic Fingerprint Verification , 2008, Int. J. Pattern Recognit. Artif. Intell..

[28]  Anil K. Jain,et al.  Likelihood Ratio-Based Biometric Score Fusion , 2008, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Horst Bunke,et al.  On a relation between graph edit distance and maximum common subgraph , 1997, Pattern Recognit. Lett..

[30]  David Lindley,et al.  A problem in forensic science , 1977 .

[31]  Jonathan J. Koehler,et al.  The Individualization Fallacy in Forensic Science Evidence , 2008 .

[32]  Sargur N. Srihari,et al.  Comparison of statistical models for writer verification , 2009, Electronic Imaging.

[33]  William J. Bodziak,et al.  Footwear Impression Evidence: Detection, Recovery and Examination , 1999 .

[34]  Sargur N. Srihari,et al.  Handwriting individualization using distance and rarity , 2011, Electronic Imaging.

[35]  Leonidas J. Guibas,et al.  The Earth Mover's Distance as a Metric for Image Retrieval , 2000, International Journal of Computer Vision.

[36]  G. G. Stokes "J." , 1890, The New Yale Book of Quotations.

[37]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[38]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[39]  Sargur N. Srihari,et al.  Footwear Print Retrieval System for Real Crime Scene Marks , 2010, ICWF.

[40]  Anil K. Jain,et al.  Unsupervised Learning of Finite Mixture Models , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[41]  Mario Vento,et al.  Thirty Years Of Graph Matching In Pattern Recognition , 2004, Int. J. Pattern Recognit. Artif. Intell..