Validity of the best practice in splitting data for hold-out validation strategy as performed on the ink strokes in the context of forensic science

Abstract External testing (ET), known also as the hold-out validation, is currently considered to be one of the most reliable ways to estimate predictive ability of a statistical model. One safeguard to prevent impermissible peeking in ET is to ensure all replicates of a particular sample is only included in either the test or the training set. Assuming a sample X1 consists of two replicates (i.e. X1a and X1b). The model is claimed to enjoy impermissible peeking if the X1a and X1b are split into the training and the test sets, respectively. Eventually, the resulting prediction model is expected to predict the test sets easily and presents an over-optimistic model performance. In forensic document examinations, an individual pen (IP) can be used to produce multiple ink strokes. In real-world practice, pens are manufactured via bulk production such that one big tank of ink is used to produce a wealth of IPs. In other words, ink strokes produced by varying IPs but of the same pen model are indeed originated from one single source (i.e. the same tank of ink). Eventually, with respect to the aforementioned safeguard, how shall one treat the ink strokes? Are they replicates or independent samples? In this context, the aim of the work is to investigate the validity of the safeguard in splitting dataset for hold-out validation strategy (i.e. ET) in the domain of forensic pen ink analysis. An infrared (IR) spectra of blue gel pen inks was used to demonstrate the practical aspect. The IR spectral data were collected from 1361 ink strokes that originated from 273 IPs of 23 pen models and 10 pen brands. Iterative stratified random sampling was employed to prepare 1000 pairs of training and test sets that were split at ratio 7:3 using two different principles: (a) set IP - selection was conducted at IP level to ensure all the ink strokes originated from a particular IP must be included into either the training or the test sets only; and (b) set NIP - ink strokes of a particular IP were allowed to be spread between the training and the test sets. For each dataset, a series of 50 PLS-DA models were constructed by including the first 50 PLS components incrementally, which were then validated via auto-prediction and ET. Following that, the performances between IP and NIP model series were compared with respect to: (a) model accuracy; (b) model stability; and (c) model fitting. In conclusion, the NIP model series do not show any evidence of advantages from the impermissible peeking since both the NIP and IP model series exhibit quite similar performances in all the three model aspects.

[1]  Pedro Araujo,et al.  Key aspects of analytical method validation and linearity evaluation. , 2009, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[2]  Glenn A Hirsch,et al.  Identification of a plasma metabolomic signature of thrombotic myocardial infarction that is distinct from non-thrombotic myocardial infarction and stable coronary artery disease , 2017, PloS one.

[3]  Konstantia Georgouli,et al.  Continuous statistical modelling for rapid detection of adulteration of extra virgin olive oil using mid infrared and Raman spectroscopic data. , 2017, Food chemistry.

[4]  Don McNaughton,et al.  Screening of Wolbachia Endosymbiont Infection in Aedes aegypti Mosquitoes Using Attenuated Total Reflection Mid-Infrared Spectroscopy. , 2017, Analytical chemistry.

[5]  Qiang Zhang,et al.  Comparison of Different Classification Methods for Analyzing Electronic Nose Data to Characterize Sesame Oils and Blends , 2015, Sensors.

[6]  Chi-Chang Lin,et al.  High efficiency SERS detection of clinical microorganism by AgNPs-decorated filter membrane and pattern recognition techniques , 2017 .

[7]  M. de la Guardia,et al.  Preliminary studies about thermal degradation of edible oils through attenuated total reflectance mid-infrared spectrometry , 2009 .

[8]  S. Hou Development of diagnostic models for canine osteoarthritis based on serum and joint fluid mid‐infrared spectral data using five different discrimination and classification methods , 2016 .

[9]  Neeraj Sinha,et al.  1H nuclear magnetic resonance (NMR)-based serum metabolomics of human gallbladder inflammation , 2016, Inflammation Research.

[10]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[11]  R. Marini,et al.  Advances in validation, risk and uncertainty assessment of bioanalytical methods. , 2011, Journal of pharmaceutical and biomedical analysis.

[12]  S. Carda‐Broch,et al.  Is it really necessary to validate an analytical method or not? That is the question. , 2012, Journal of chromatography. A.

[13]  Kyle C. Doty,et al.  Forensic Hair Differentiation Using Attenuated Total Reflection Fourier Transform Infrared (ATR FT-IR) Spectroscopy , 2016, Applied spectroscopy.

[14]  L. Hoffmann,et al.  Development and validation of a voltammetric method for determination of total phenolic acids in cotton cultivars , 2013 .

[15]  Paul Geladi,et al.  Principles of Proper Validation: use and abuse of re‐sampling for validation , 2010 .

[16]  Alina A. von Davier,et al.  Cross-Validation , 2014 .

[17]  K. Paul Kirkbride,et al.  Organic and inorganic discrimination of ballpoint pen inks by ToF-SIMS and multivariate statistics , 2010 .

[18]  Wouter Saeys,et al.  Performance evaluation of preprocessing techniques utilizing expert information in multivariate calibration. , 2014, Talanta.

[19]  Márcio José Coelho Pontes,et al.  Transfer of multivariate classification models applied to digital images and fluorescence spectroscopy data , 2017 .

[20]  Daniela M. Witten,et al.  An Introduction to Statistical Learning: with Applications in R , 2013 .

[21]  Douglas A Lauffenburger,et al.  Peritoneal fluid cytokines related to endometriosis in patients evaluated for infertility. , 2017, Fertility and sterility.

[22]  Davor Z Antanasijević,et al.  Review: the approaches for estimation of limit of detection for ICP-MS trace analysis of arsenic. , 2012, Talanta.

[23]  Camilo L. M. Morais,et al.  LDA vs. QDA for FT-MIR prostate cancer tissue classification , 2017 .

[24]  K. Shepherd,et al.  Evaluating the utility of mid-infrared spectral subspaces for predicting soil properties , 2016, Chemometrics and intelligent laboratory systems : an international journal sponsored by the Chemometrics Society.

[25]  Michael J. Allen,et al.  Foundations of Forensic Document Analysis: Theory and Practice , 2015 .

[26]  M A Castillo,et al.  Initial evaluation of quantitative performance of chromatographic methods using replicates at multiple concentrations. , 2001, Journal of chromatography. A.

[27]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[28]  Douglas M. Hawkins,et al.  The Problem of Overfitting , 2004, J. Chem. Inf. Model..

[29]  Klaus Mayer,et al.  Raman spectroscopy of uranium compounds and the use of multivariate analysis for visualization and classification. , 2015, Forensic science international.

[30]  Kapil Kalra Method Development and Validation of Analytical Procedures , 2011 .

[31]  Yan Li,et al.  A Comprehensive and Comparative Study of Wolfiporia extensa Cultivation Regions by Fourier Transform Infrared Spectroscopy and Ultra-Fast Liquid Chromatography , 2016, PloS one.

[32]  Miguel de la Guardia,et al.  Feature selection strategies for quality screening of diesel samples by infrared spectrometry and linear discriminant analysis. , 2013, Talanta.

[33]  C. Neumann,et al.  Forensic examination of ink by high-performance thin layer chromatography--the United States Secret Service Digital Ink Library. , 2011, Journal of chromatography. A.

[34]  M. Ezcurra,et al.  Analytical methods for dating modern writing instrument inks on paper. , 2010, Forensic science international.

[35]  R. Real,et al.  AUC: a misleading measure of the performance of predictive distribution models , 2008 .

[36]  Yong He,et al.  Optical Determination of Lead Chrome Green in Green Tea by Fourier Transform Infrared (FT-IR) Transmission Spectroscopy , 2017, PloS one.

[37]  Véronique Bellon-Maurel,et al.  Robustness of models developed by multivariate calibration. Part I: The assessment of robustness , 2004 .

[38]  K. Robasky,et al.  The role of replicates for error mitigation in next-generation sequencing , 2013, Nature Reviews Genetics.

[39]  Richard G. Brereton,et al.  Pattern Recognition of Gas Chromatography Mass Spectrometry of Human Volatiles in Sweat to distinguish the sex of subjects and determine potential Discriminatory Marker Peaks , 2007 .