"Analyzing User-Event Data using Score- Based Likelihood Ratios with Marked Point Processes"

Abstract In this paper we investigate the application of score-based likelihood ratio techniques to the problem of detecting whether two time-stamped event streams were generated by the same source or by two different sources. We develop score functions for event data streams by building on ideas from the statistical modeling of marked point processes, focusing in particular on the coefficient of segregation and mingling index. The methodology is applied to a data set consisting of logs of computer activity over a 7-day period from 28 different individuals. Experimental results on known same-source and known different-source data sets indicate that the proposed scores have significant discriminative power in this context. The paper concludes with a discussion of the potential benefits and challenges that may arise from the application of statistical analysis to user-event data in digital forensics.

[1]  Marius Kloft,et al.  Tracked Without a Trace: Linking Sessions of Users by Unsupervised Learning of Patterns in Their DNS Traffic , 2016, AISec@CCS.

[2]  Horst Bunke,et al.  A writer identification and verification system using HMM based recognizers , 2006, Pattern Analysis and Applications.

[3]  Jonathan Grier,et al.  Detecting data theft using stochastic forensics , 2011, Digit. Investig..

[4]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[5]  Ian W. Evett,et al.  Interpreting DNA Evidence: A Review , 2003 .

[6]  Sangjin Lee,et al.  Advanced evidence collection and analysis of web browser activity , 2011, Digit. Investig..

[7]  Nasir Memon,et al.  InVEST: Intelligent visual email search and triage , 2016 .

[8]  Colin Aitken,et al.  Evaluation of trace evidence in the form of multivariate data , 2004 .

[9]  Evimaria Terzi,et al.  Constructing comprehensive summaries of large event sequences , 2008, KDD.

[10]  Jantje A. M. Silomon,et al.  Digital Meta-Forensics : Quantifying the Investigation , 2010 .

[11]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[12]  K. Hanisch,et al.  Some remarks on estimators of the distribution function of nearest neighbour distance in stationary spatial point processes , 1984 .

[13]  Christophe Champod,et al.  Computation of Likelihood Ratios in Fingerprint Identification for Configurations of Any Number of Minutiæ , 2007, Journal of forensic sciences.

[14]  Javier Ortega-Garcia,et al.  Robust estimation, interpretation and assessment of likelihood ratios in forensic speaker recognition , 2006, Comput. Speech Lang..

[15]  D. Stoyan,et al.  Statistical Analysis and Modelling of Spatial Point Patterns , 2008 .

[16]  Eoghan Casey Bs Ma Digital Evidence and Computer Crime: Forensic Science, Computers, and the Internet , 2000 .

[17]  Gloria Mark,et al.  Coming of Age (Digitally): An Ecological View of Social Media Use among College Students , 2015, CSCW.

[18]  George Loukas,et al.  Facilitating forensic examinations of multi-user computer environments through session-to-session analysis of Internet history , 2016 .

[19]  Vassil Roussev,et al.  Forensic analysis of cloud-native artifacts , 2016 .

[20]  Didier Meuwly,et al.  A guideline for the validation of likelihood ratio methods used for forensic evidence evaluation. , 2017, Forensic science international.

[21]  David Lazer,et al.  Inferring friendship network structure by using mobile phone data , 2009, Proceedings of the National Academy of Sciences.

[22]  Shunichi Ishihara A Forensic Authorship Classification in SMS Messages: A Likelihood Ratio Based Approach Using N-gram , 2011, ALTA.

[23]  Murilo Tito Pereira Forensic analysis of the Firefox 3 Internet history and recovery of deleted SQLite records , 2009, Digit. Investig..

[24]  Vassil Roussev,et al.  Digital Forensic Science: Issues, Methods, and Challenges , 2017, Digital Forensic Science.

[25]  Florian P. Buchholz,et al.  Design and Implementation of Zeitline: a Forensic Timeline Editor , 2005, DFRWS.

[26]  Martin Lopatka,et al.  Evaluating score- and feature-based likelihood ratio models for multivariate continuous data: applied to forensic MDMA comparison , 2015 .