Machine learning in computer forensics (and the lessons learned from machine learning in computer security)

In this paper, we discuss the role that machine learning can play in computer forensics. We begin our analysis by considering the role that machine learning has gained in computer security applications, with the aim of aiding the computer forensics community in learning the lessons from the experience of the computer security community. Afterwards, we propose a brief literature review, with the purpose of illustrating the areas of computer forensics where machine learning techniques have been used until now. Then, we remark the technical requirements that should be meet by tools for computer security and computer forensics applications, with the goal of illustrating in which way machine learning algorithms can be of any practical help. We intend this paper to foster applications of machine learning in computer forensics, and we hope that the ideas in this paper may represent promising directions to pursue in the quest for more efficient and effective computer forensics tools.

[1]  K. J. Bma Integrity considerations for secure computer systems , 1977 .

[2]  Tinghua Wang,et al.  Network forensics based on fuzzy logic and expert system , 2009, Comput. Commun..

[3]  Simson L. Garfinkel,et al.  Bringing science to digital forensics with standardized forensic corpora , 2009, Digit. Investig..

[4]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.

[5]  Benjamin C. M. Fung,et al.  Mining writeprints from anonymous e-mails for forensic investigation , 2010, Digit. Investig..

[6]  Benjamin C. M. Fung,et al.  A unified data mining solution for authorship analysis in anonymous textual communications , 2013, Inf. Sci..

[7]  David A. Bell,et al.  Secure computer systems: mathematical foundations and model , 1973 .

[8]  Nasir D. Memon,et al.  NetStore: An Efficient Storage Infrastructure for Network Forensics and Monitoring , 2010, RAID.

[9]  Wei Wang,et al.  A Graph Based Approach Toward Network Forensics Analysis , 2008, TSEC.

[10]  Simson L. Garfinkel,et al.  File Fragment Classification-The Case for Specialized Approaches , 2009, 2009 Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering.

[11]  Blaine Nelson,et al.  Can machine learning be secure? , 2006, ASIACCS '06.

[12]  Mariko Nakano-Miyatake,et al.  Network forensics with Neurofuzzy techniques , 2009, 2009 52nd IEEE International Midwest Symposium on Circuits and Systems.

[13]  Peter L. Bartlett,et al.  Open problems in the security of learning , 2008, AISec '08.

[14]  Jan H. P. Eloff,et al.  Exploring Forensic Data with Self-Organizing Maps , 2005, IFIP Int. Conf. Digital Forensics.

[15]  Rajarathnam Chandramouli,et al.  Author gender identification from text , 2011, Digit. Investig..

[16]  N. Shahmehri,et al.  File Type Identification of Data Fragments by Their Binary Structure , 2006, 2006 IEEE Information Assurance Workshop.

[17]  Nicole Beebe,et al.  Digital Forensic Research: The Good, the Bad and the Unaddressed , 2009, IFIP Int. Conf. Digital Forensics.

[18]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[19]  Olivier Y. de Vel File classification using byte sub-stream kernels , 2004, Digit. Investig..

[20]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.

[21]  Dorothy E. Denning,et al.  An Intrusion-Detection Model , 1987, IEEE Transactions on Software Engineering.

[22]  Nick Feamster,et al.  Behavioral Clustering of HTTP-Based Malware and Signature Generation Using Malicious Network Traces , 2010, NSDI.

[23]  Chris R. Chatwin,et al.  A framework for post-event timeline reconstruction using neural networks , 2007, Digit. Investig..

[24]  Alfonso Valdes,et al.  Probabilistic Alert Correlation , 2001, Recent Advances in Intrusion Detection.

[25]  Mohammad Hossain Heydari,et al.  Content based file type detection algorithms , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[26]  Golden G. Richard,et al.  FACE: Automated digital evidence discovery and correlation , 2008, Digit. Investig..

[27]  Roberto Tronci,et al.  HMMPayl: An intrusion detection system based on Hidden Markov Models , 2011, Comput. Secur..

[28]  John McHugh,et al.  Testing Intrusion detection systems: a critique of the 1998 and 1999 DARPA intrusion detection system evaluations as performed by Lincoln Laboratory , 2000, TSEC.

[29]  Marc Dacier,et al.  A framework for attack patterns' discovery in honeynet data , 2008 .