Quantifying the association between discrete event time series with applications to digital forensics

We consider the problem of quantifying the degree of association between pairs of discrete event time series, with potential applications in forensic and cybersecurity settings. We focus in particular on the case where two associated event series exhibit temporal clustering such that the occurrence of one type of event at a particular time increases the likelihood that an event of the other type will also occur nearby in time. We pursue a non‐parametric approach to the problem and investigate various score functions to quantify association, including characteristics of marked point processes and summary statistics of interevent times. Two techniques are proposed for assessing the significance of the measured degree of association: a population‐based approach to calculating score‐based likelihood ratios when a sample from a relevant population is available, and a resampling approach to computing coincidental match probabilities when only a single pair of event series is available. The methods are applied to simulated data and to two real world data sets consisting of logs of computer activity and achieve accurate results across all data sets.

[1]  Meike J. Wittmann,et al.  Mathematical Ecology , 2006 .

[2]  Padhraic Smyth,et al.  "Analyzing User-Event Data using Score- Based Likelihood Ratios with Marked Point Processes" , 2017 .

[3]  Didier Meuwly,et al.  A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation. , 2017, Forensic science international.

[4]  Hal S. Stern Statistical Issues in Forensic Science , 2017 .

[5]  Reik V. Donner,et al.  CoinCalc - A new R package for quantifying simultaneities of event series , 2016, Comput. Geosci..

[6]  Vassil Roussev,et al.  Digital Forensic Science: Issues, Methods, and Challenges , 2017, Digital Forensic Science.

[7]  Virgilio Gómez-Rubio,et al.  Spatial Point Patterns: Methodology and Applications with R , 2016 .

[8]  Vassil Roussev,et al.  Forensic analysis of cloud-native artifacts , 2016 .

[9]  J. Donges,et al.  Event coincidence analysis for quantifying statistical interrelationships between event time series , 2015, 1508.03534.

[10]  Katharina Wagner,et al.  Digital Evidence And Computer Crime Forensic Science Computers And The Internet , 2016 .

[11]  Ulrike Goldschmidt,et al.  An Introduction To The Theory Of Point Processes , 2016 .

[12]  W. Thompson,et al.  Lay understanding of forensic statistics: Evaluation of random match probabilities, likelihood ratios, and verbal equivalents. , 2015, Law and human behavior.

[13]  Martin Lopatka,et al.  Evaluating score- and feature-based likelihood ratio models for multivariate continuous data: applied to forensic MDMA comparison , 2015 .

[14]  Gloria Mark,et al.  Coming of Age (Digitally): An Ecological View of Social Media Use among College Students , 2015, CSCW.

[15]  C. Champod,et al.  ENFSI guIdElINE For EvaluatIvE rEportINg IN ForENSIc ScIENcE Strengthening the Evaluation of Forensic Results across Europe ( STEOFRAE , 2015 .

[16]  Norbert Marwan,et al.  Spatiotemporal characteristics and synchronization of extreme rainfall in South America with focus on the Andes Mountain range , 2015, Climate Dynamics.

[17]  Alex Kent Anonymized User-Computer Authentication Associations in Time , 2014 .

[18]  P. Diggle,et al.  On tests of spatial pattern based on simulation envelopes , 2014 .

[19]  S. Harding Presenting the evidence , 2012 .

[20]  Amanda B. Hepler,et al.  Score-based likelihood ratios for handwriting evidence. , 2012, Forensic science international.

[21]  Steven K. Morley,et al.  Determining the significance of associations between two series of discrete events : bootstrap methods / , 2012 .

[22]  M. Buoncristiani,et al.  Searching for first-degree familial relationships in California's offender DNA database: validation of a likelihood ratio-based approach. , 2011, Forensic science international. Genetics.

[23]  Sangjin Lee,et al.  Advanced evidence collection and analysis of web browser activity , 2011, Digit. Investig..

[24]  Norbert Marwan,et al.  Spatial structures and directionalities in Monsoonal precipitation over South Asia , 2010 .

[25]  A. Gelfand,et al.  Spatial Point Patterns , 2010 .

[26]  David J. Hand,et al.  ROC Curves for Continuous Data , 2009 .

[27]  Filippo Radicchi Human Activity in the Web , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[28]  Nasir D. Memon,et al.  Digital Forensics , 2009, IEEE Secur. Priv..

[29]  Esko Valkeila,et al.  An Introduction to the Theory of Point Processes, Volume II: General Theory and Structure, 2nd Edition by Daryl J. Daley, David Vere‐Jones , 2008 .

[30]  Matthieu Schmittbuhl,et al.  Probabilistic evaluation of handwriting evidence: likelihood ratio for authorship , 2008 .

[31]  J. Loh A Valid and Fast Spatial Bootstrap for Correlation Functions , 2008, 0805.2325.

[32]  D. Stoyan,et al.  Statistical Analysis and Modelling of Spatial Point Patterns , 2008 .

[33]  Rafał Synowiecki,et al.  Consistency and application of moving block bootstrap for non-stationary time series with periodic and almost periodic structure , 2007, 0711.4493.

[34]  P. Dixon Ripley's K Function , 2006 .

[35]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[36]  Arun Ross,et al.  Handbook of Multibiometrics , 2006, The Kluwer international series on biometrics.

[37]  J. D. Hunter,et al.  Amplitude and frequency dependence of spike timing: implications for dynamic regulation. , 2003, Journal of neurophysiology.

[38]  Myra Spiliopoulou,et al.  A Framework for the Evaluation of Session Reconstruction Heuristics in Web-Usage Analysis , 2003, INFORMS J. Comput..

[39]  A. Albano,et al.  Comment on "Performance of different synchronization measures in real data: a case study on electroencephalographic signals". , 2003, Physical review. E, Statistical, nonlinear, and soft matter physics.

[40]  R. Quiroga,et al.  Reply to ``Comment on `Performance of different synchronization measures in real data: A case study on electroencephalographic signals' '' , 2001, nlin/0109023.

[41]  Didier Meuwly,et al.  The inference of identity in forensic speaker recognition , 2000, Speech Commun..

[42]  Philip M. Dixon,et al.  The Effects of Drought on Foraging Habitat Selection of Breeding Wood Storks in Coastal Georgia , 2000 .

[43]  R. Nichols,et al.  Interpreting DNA Evidence: Statistical Genetics for Forensic Scientists , 1999, Heredity.

[44]  D. W. Scott Multivariate Density Estimation: Theory, Practice, and Visualization , 1992, Wiley Series in Probability and Statistics.

[45]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[46]  Colin Aitken,et al.  The use of statistics in forensic science , 1991 .

[47]  P J Diggle,et al.  Second-order analysis of spatial clustering for inhomogeneous populations. , 1991, Biometrics.

[48]  K. Hanisch,et al.  Some remarks on estimators of the distribution function of nearest neighbour distance in stationary spatial point processes , 1984 .