On Descriptive and Predictive Models for Serial Crime Analysis

Law enforcement agencies regularly collect crime scene information. There exists, however, no detailed, systematic procedure for this. The data collected is affected by the experience or current condition of law enforcement officers. Consequently, the data collected might differ vastly between crime scenes. This is especially problematic when investigating volume crimes. Law enforcement officers regularly do manual comparison on crimes based on the collected data. This is a time-consuming process; especially as the collected crime scene information might not always be comparable. The structuring of data and introduction of automatic comparison systems could benefit the investigation process. This thesis investigates descriptive and predictive models for automatic comparison of crime scene data with the purpose of aiding law enforcement investigations. The thesis first investigates predictive and descriptive methods, with a focus on data structuring, comparison, and evaluation of methods. The knowledge is then applied to the domain of crime scene analysis, with a focus on detecting serial residential burglaries. This thesis introduces a procedure for systematic collection of crime scene information. The thesis also investigates impact and relationship between crime scene characteristics and how to evaluate the descriptive model results. The results suggest that the use of descriptive and predictive models can provide feedback for crime scene analysis that allows a more effective use of law enforcement resources. Using descriptive models based on crime characteristics, including Modus Operandi, allows law enforcement agents to filter cases intelligently. Further, by estimating the link probability between cases, law enforcement agents can focus on cases with higher link likelihood. This would allow a more effective use of law enforcement resources, potentially allowing an increase in clear-up rates.

[1]  Andreas Jacobsson,et al.  Learning to detect spyware using end user license agreements , 2011, Knowledge and Information Systems.

[2]  Wu Liu,et al.  IPGroupRep: A Novel Reputation Based System for Anti-Spam , 2009, 2009 Symposia and Workshops on Ubiquitous, Autonomic and Trusted Computing.

[3]  Dennis McLeod,et al.  Spam Email Classification using an Adaptive Ontology , 2007, J. Softw..

[4]  Jerry H. Ratcliffe,et al.  Aoristic Signatures and the Spatio-Temporal Analysis of High Volume Crime Patterns , 2002 .

[5]  M E J Newman,et al.  Modularity and community structure in networks. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[6]  Niklas Lavesson,et al.  E-Mail Prioritization using Online Social Network Profile Distance , 2012, Int. J. Comput. Sci. Appl..

[7]  Shawn A. Weil,et al.  New Approaches to Overcoming E-Mail Overload , 2004 .

[8]  Ray Bull,et al.  The psychology of linking crimes: A review of the evidence , 2007 .

[9]  Joshua B. Plotkin,et al.  Spatiotemporal correlations in criminal offense records , 2011, TIST.

[10]  Colin Robson,et al.  Real World Research: A Resource for Social Scientists and Practitioner-Researchers , 1993 .

[11]  Thomas Tran,et al.  Social Email: A Framework and Application for More Socially-Aware Communications , 2010, SocInfo.

[12]  P. Lio’,et al.  Periodic gene expression program of the fission yeast cell cycle , 2004, Nature Genetics.

[13]  S. Chainey,et al.  GIS and Crime Mapping , 2005 .

[14]  P. Oscar Boykin,et al.  Leveraging social networks to fight spam , 2005, Computer.

[15]  Yiming Yang,et al.  High-performing feature selection for text classification , 2002, CIKM '02.

[16]  Daniel J. Power,et al.  Supporting Decision-Makers: An Expanded Framework , 2001 .

[17]  K. Perreault,et al.  Research Design: Qualitative, Quantitative, and Mixed Methods Approaches , 2011 .

[18]  Tom Fawcett,et al.  ROC Graphs: Notes and Practical Considerations for Data Mining Researchers , 2003 .

[19]  Craig Bennell,et al.  Between a ROC and a hard place: a method for linking serial burglaries by modus operandi , 2005 .

[20]  Ming-Syan Chen,et al.  ProMail: Using Progressive Email Social Network for Spam Detection , 2007, PAKDD.

[21]  P. Jaccard THE DISTRIBUTION OF THE FLORA IN THE ALPINE ZONE.1 , 1912 .

[22]  Christer Carlsson,et al.  Past, present, and future of decision support technology , 2002, Decis. Support Syst..

[23]  Craig Bennell,et al.  Addressing problems with traditional crime linking methods using receiver operating characteristic analysis , 2009 .

[24]  Douglas B. Kell,et al.  Computational cluster validation in post-genomic data analysis , 2005, Bioinform..

[25]  C. H. Chen,et al.  An intelligent stock trading decision support system through integration of genetic algorithm based fuzzy neural network and artificial neural network , 2001, Fuzzy Sets Syst..

[26]  Fabrizio Sebastiani Classification of Text, Automatic , 2006 .

[27]  W. Shadish,et al.  Experimental and Quasi-Experimental Designs for Generalized Causal Inference , 2001 .

[28]  Donald E. Brown,et al.  A decision model for spatial site selection by criminals: a foundation for law enforcement decision support , 2003, IEEE Trans. Syst. Man Cybern. Part C.

[29]  Brent Snook,et al.  Linkage analysis in cases of serial burglary: comparing the performance of university students, police professionals, and a logistic regression model , 2010 .

[30]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[31]  Geoff Holmes,et al.  Multinomial Naive Bayes for Text Categorization Revisited , 2004, Australian Conference on Artificial Intelligence.

[32]  E. Kim,et al.  A survey of decision support system applications (1995–2001) , 2006, J. Oper. Res. Soc..

[33]  Meizhen Wang,et al.  Research on Behavior Statistic Based Spam Filter , 2009, 2009 First International Workshop on Education Technology and Computer Science.

[34]  Mark J. Embrechts,et al.  On the Use of the Adjusted Rand Index as a Metric for Evaluating Supervised Classification , 2009, ICANN.

[35]  Salvatore J. Stolfo,et al.  Email archive analysis through graphical visualization , 2004, VizSEC/DMSEC '04.

[36]  Guiyun Zhou,et al.  A web-based geographical information system for crime mapping and decision support , 2012, 2012 International Conference on Computational Problem-Solving (ICCP).

[37]  David James Power,et al.  A brief history of decision support systems , 2003, WWW 2003.

[38]  Ping Guo,et al.  Gene Expression Data Cluster Analysis , 2009, 2009 WASE International Conference on Information Engineering.

[39]  Aidong Zhang,et al.  Cluster analysis for gene expression data: a survey , 2004, IEEE Transactions on Knowledge and Data Engineering.

[40]  M. Newman,et al.  Mixing patterns in networks. , 2002, Physical review. E, Statistical, nonlinear, and soft matter physics.

[41]  Yanhua Chen,et al.  Gene Expression Clustering: a Novel Graph Partitioning Approach , 2007, 2007 International Joint Conference on Neural Networks.

[42]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[43]  Ulrik Brandes,et al.  On Finding Graph Clusterings with Maximum Modularity , 2007, WG.

[44]  Ray Hunt,et al.  Current and New Developments in Spam Filtering , 2006, 2006 14th IEEE International Conference on Networks.

[45]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[46]  Rik Sarkar,et al.  Community Detection , 2014, Encyclopedia of Machine Learning and Data Mining.

[47]  Craig Bennell,et al.  The impact of data degradation and sample size on the performance of two similarity coefficients used in behavioural linkage analysis. , 2010, Forensic science international.

[48]  Enrico Blanzieri,et al.  A survey of learning-based techniques of email spam filtering , 2008, Artificial Intelligence Review.

[49]  Partha S. Vasisht Computational Analysis of Microarray Data , 2003 .

[50]  Tore Dybå,et al.  A systematic review of quasi-experiments in software engineering , 2009, Inf. Softw. Technol..

[51]  N. Lavesson,et al.  Automated Spyware Detection Using End User License Agreements , 2008, 2008 International Conference on Information Security and Assurance (isa 2008).

[52]  Sunil J Rao,et al.  Regression Modeling Strategies: With Applications to Linear Models, Logistic Regression, and Survival Analysis , 2003 .

[53]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[54]  D V Canter,et al.  Linking commercial burglaries by modus operandi: tests using regression and ROC analysis. , 2002, Science & justice : journal of the Forensic Science Society.

[55]  Charu C. Aggarwal,et al.  Graph Clustering , 2010, Encyclopedia of Machine Learning and Data Mining.

[56]  Sean B. Eom Decision support systems research: current state and trends , 1999 .

[57]  William C. Arnold,et al.  AUTOMATICALLY GENERATED WIN32 HEURISTIC VIRUS DETECTION , 2000 .

[58]  Emin Gün Sirer,et al.  Fighting peer-to-peer SPAM and decoys with object reputation , 2005, P2PECON '05.

[59]  Xin Yuan,et al.  Behavioral Characteristics of Spammers and Their Network Reachability Properties , 2007, 2007 IEEE International Conference on Communications.

[60]  Abdul Razak Hamdan,et al.  Decision Support Systems (DSS) in Construction Tendering Processes , 2010, ArXiv.

[61]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[62]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[63]  Hsinchun Chen,et al.  COPLINK: managing law enforcement data and knowledge , 2003, CACM.

[64]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[65]  John Zeleznikow,et al.  Decision support systems for police: Lessons from the application of data mining techniques to “soft” forensic evidence , 2006, Artificial Intelligence and Law.

[66]  Vipul Ved Prakash,et al.  Fighting Spam with Reputation Systems , 2005, ACM Queue.

[67]  Emilie M. Roth,et al.  Can We Ever Escape from Data Overload? A Cognitive Systems Diagnosis , 1999 .

[68]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[69]  Paul Davidsson,et al.  Evaluating learning algorithms and classifiers , 2007, Int. J. Intell. Inf. Database Syst..

[70]  Ed Skoudis,et al.  Malware: Fighting Malicious Code , 2003 .

[71]  Claes Wohlin,et al.  Empirical Research Methods in Software Engineering , 2003, ESERNET.

[72]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[73]  Colleen McCue,et al.  Data Mining and Predictive Analysis: Intelligence Gathering and Crime Analysis , 2006 .

[74]  Robert J. Hilderman,et al.  Categorical Proportional Difference: A Feature Selection Method for Text Categorization , 2008, AusDM.

[75]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[76]  Brent Snook,et al.  Computerized Crime Linkage Systems , 2012 .

[77]  William W. Cohen Learning Rules that Classify E-Mail , 1996 .

[78]  Pang-Ning Tan,et al.  History-Based Email Prioritization , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[79]  Galen A. Grimes,et al.  Email end users and spam: relations of gender and age group to attitudes and actions , 2007, Comput. Hum. Behav..

[80]  Dorothea Wagner,et al.  Dynamic Graph Clustering Using Minimum-Cut Trees , 2009, J. Graph Algorithms Appl..

[81]  Takeshi Okamoto,et al.  A distributed approach to computer virus detection and neutralization by autonomous and heterogeneous agents , 1999, Proceedings. Fourth International Symposium on Autonomous Decentralized Systems. - Integration of Heterogeneous Systems -.

[82]  Susmita Datta,et al.  Comparisons and validation of statistical clustering techniques for microarray gene expression data , 2003, Bioinform..

[83]  Veselka Boeva,et al.  A Multi-purpose Time Series Data Standardization Method , 2010 .

[84]  Martin Boldt,et al.  Detecting serial residential burglaries using clustering , 2014, Expert Syst. Appl..

[85]  Pietro Lió,et al.  Measuring similarity between gene expression profiles: a Bayesian approach , 2009, BMC Genomics.

[86]  David J. Sheskin,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 1997 .

[87]  S. M. Kamruzzaman,et al.  Text Classification using Data Mining , 2010, ArXiv.

[88]  Robert E. Tarjan,et al.  Graph Clustering and Minimum Cut Trees , 2004, Internet Math..

[89]  Stan Matwin,et al.  Feature Engineering for Text Classification , 1999, ICML.

[90]  W. B. Cavnar,et al.  N-gram-based text categorization , 1994 .

[91]  Sang M. Lee,et al.  A Survey of Decision Support System Applications (1971–April 1988) , 1990 .

[92]  Michael Gertz,et al.  Mining email social networks , 2006, MSR '06.

[93]  Steve Hedley A brief history of spam , 2006 .

[94]  Walmir M. Caminhas,et al.  A review of machine learning approaches to Spam filtering , 2009, Expert Syst. Appl..

[95]  Dennis McLeod,et al.  Efficient Spam Email Filtering using Adaptive Ontology , 2007, Fourth International Conference on Information Technology (ITNG'07).

[96]  Jeffrey O. Kephart,et al.  Biologically Inspired Defenses Against Computer Viruses , 1995, IJCAI.

[97]  Leah S. Larkey,et al.  Automatic essay grading using text categorization techniques , 1998, SIGIR '98.

[98]  Sean B. Eom,et al.  A survey of decision support system applications (1988–1994) , 1998, J. Oper. Res. Soc..

[99]  Ickjai Lee,et al.  Crime analysis through spatial areal aggregated density patterns , 2011, GeoInformatica.

[100]  Jessica Woodhams,et al.  Linking serial residential burglary: comparing the utility of modus operandi behaviours, geographical proximity, and temporal proximity , 2010 .

[101]  Ray Bull,et al.  Linking Different Types of Crime Using Geographical and Temporal Proximity , 2011 .

[102]  Ying Cai,et al.  Spatial and temporal distribution and statistic method applied in crime events analysis , 2011, 2011 19th International Conference on Geoinformatics.

[103]  R. Polikar,et al.  Ensemble based systems in decision making , 2006, IEEE Circuits and Systems Magazine.

[104]  Carlo Giupponi,et al.  Towards the development of a decision support system for water resource management , 2005, Environ. Model. Softw..

[105]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[106]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[107]  C. Lee Giles,et al.  Efficient identification of Web communities , 2000, KDD '00.

[108]  Elias Procópio Duarte,et al.  Improved Parallel Implementations of Gusfield's Cut Tree Algorithm , 2011, 2011 Simpasio em Sistemas Computacionais.

[109]  Veselka Boeva,et al.  Two-Pass Imputation Algorithm for Missing Value Estimation in gene Expression Time Series , 2007, J. Bioinform. Comput. Biol..

[110]  M. Tariq Banday,et al.  Effectiveness and Limitations of Statistical Spam Filters , 2009, ArXiv.

[111]  Dongmei Jia Cost-effective spam detection in p2p file-sharing systems , 2008, LSDS-IR '08.

[112]  Ying Xu,et al.  Clustering gene expression data using a graph-theoretic approach: an application of minimum spanning trees , 2002, Bioinform..

[113]  Yiming Yang,et al.  Mining social networks for personalized email prioritization , 2009, KDD.

[114]  Lakhmi C. Jain,et al.  Adaptation of a Mamdani Fuzzy Inference System Using Neuro-Genetic Approach for Tactical Air Combat Decision Support System , 2002, Australian Joint Conference on Artificial Intelligence.

[115]  Peter Szor,et al.  The Art of Computer Virus Research and Defense , 2005 .

[116]  Javid Taheri,et al.  SparseDTW: A Novel Approach to Speed up Dynamic Time Warping , 2009, AusDM.

[117]  Eric Filiol,et al.  Behavioral detection of malware: from a survey towards an established taxonomy , 2008, Journal in Computer Virology.