A comparison of machine learning techniques for file system forensics analysis

Abstract With the remarkable increase in computer crimes – particularly Internet related crimes – digital forensics become an urgent and a timely issue to study. Normally, digital forensics investigation aims to preserve any evidence in its most original form by identifying, collecting, and validating the digital information for the purpose of reconstructing past events. Most digital evidence is stored within the computer's file system. This research investigates and evaluates the applicability of several machine learning techniques in identifying incriminating evidence by tracing historical file system activities in order to determine how these files can be manipulated by different application programs. A dataset defined by a matrix/vector of features related to file system activity during a specific period of time has been collected. Such dataset has been used to train several machine learning techniques. Overall, the considered machine learning techniques show good results when they have been evaluated using a testing dataset containing unseen evidence. However, all algorithms encountered an essential obstacle that could be the main reason as why the experimental results were less than expectation that is the overlaps among the file system activities.

[1]  I. Jolliffe Principal Component Analysis , 2002 .

[2]  Huan Liu,et al.  Efficient Feature Selection via Analysis of Relevance and Redundancy , 2004, J. Mach. Learn. Res..

[3]  Raja Srinivasa Reddy Boddu,et al.  Waikato Environment for Knowledge Analysis , 2019 .

[4]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[5]  Nir Friedman,et al.  Bayesian Network Classifiers , 1997, Machine Learning.

[6]  Harlan Carvey The Windows Registry as a forensic resource , 2005, Digit. Investig..

[7]  Rami M. Mohammad,et al.  An intelligent model for trustworthiness evaluation in semantic web applications , 2017, 2017 8th International Conference on Information and Communication Systems (ICICS).

[8]  T. L. McCluskey,et al.  Tutorial and critical analysis of phishing websites methods , 2015, Comput. Sci. Rev..

[9]  T. L. McCluskey,et al.  Predicting phishing websites based on self-structuring neural network , 2013, Neural Computing and Applications.

[10]  Marilyn T. Miller,et al.  Henry Lee's Crime Scene Handbook , 2001 .

[11]  T. L. McCluskey,et al.  An Improved Self-Structuring Neural Network , 2016, PAKDD Workshops.

[12]  J. E. Jackson,et al.  Factor analysis, an applied approach , 1983 .

[13]  Fadi A. Thabtah,et al.  MAC: A Multiclass Associative Classification Algorithm , 2012, J. Inf. Knowl. Manag..

[14]  Marcus K. Rogers,et al.  Finding Forensic Information on Creating a Folder in $LogFile of NTFS , 2011, ICDF2C.

[15]  Neda Abdelhamid,et al.  Multi-label rules for phishing classification , 2015 .

[16]  George M. Mohay,et al.  RICH EVENT REPRESENTATION FOR COMPUTER FORENSICS , 2004 .

[17]  Atif Ahmad,et al.  FIRESTORM: Exploring the Need for a Forensic Tool for Pattern Correlation in Windows NT Audit Logs , 2002 .

[18]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques with Java implementations , 2002, SGMD.

[19]  Veera Boonjing,et al.  Heart Disease Classification Using Neural Network and Feature Selection , 2011, 2011 21st International Conference on Systems Engineering.

[20]  Rami Mustafa A. Mohammad,et al.  A Neural Network based Digital Forensics Classification , 2018, 2018 IEEE/ACS 15th International Conference on Computer Systems and Applications (AICCSA).

[21]  Bill Nelson,et al.  Guide to Computer Forensics and Investigations , 2003 .

[22]  Pavel Gladyshev,et al.  Using shellbag information to reconstruct user activities , 2009, Digit. Investig..

[23]  Qingzhong Liu,et al.  Feature Selection for Improved Phishing Detection , 2012, IEA/AIE.

[24]  T. L. McCluskey,et al.  An assessment of features related to phishing websites using an automated technique , 2012, 2012 International Conference for Internet Technology and Secured Transactions.

[25]  Fadi Thabtah,et al.  An Experimental Study for Assessing Email Classification Attributes Using Feature Selection Methods , 2014, 2014 3rd International Conference on Advanced Computer Science Applications and Technologies.

[26]  Max Bramer,et al.  Principles of Data Mining , 2013, Undergraduate Topics in Computer Science.

[27]  Eugene H. Spafford,et al.  Automated Digital Evidence Target Definition Using Outlier Analysis and Existing Evidence , 2005, DFRWS.

[28]  Brian D. Carrier,et al.  File System Forensic Analysis , 2005 .

[29]  T. L. McCluskey,et al.  Intelligent rule-based phishing websites classification , 2014, IET Inf. Secur..

[30]  Ingmar Nitze,et al.  COMPARISON OF MACHINE LEARNING ALGORITHMS RANDOM FOREST, ARTIFICIAL NEURAL NETWORK AND SUPPORT VECTOR MACHINE TO MAXIMUM LIKELIHOOD FOR SUPERVISED CROP TYPE CLASSIFICATION , 2012 .

[31]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[32]  M. Tahar Kechadi,et al.  A complete formalized knowledge representation model for advanced digital forensics timeline analysis , 2014, Digit. Investig..

[33]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[34]  Fadi A. Thabtah,et al.  Intelligent phishing detection system for e-banking using fuzzy data mining , 2010, Expert Syst. Appl..

[35]  Fadi Thabtah,et al.  Predicting Phishing Websites using Neural Network trained with Back-Propagation , 2013 .

[36]  Eugene H. Spafford,et al.  An Event-Based Digital Forensic Investigation Framework , 2004 .

[37]  Dong Xiang,et al.  Information-theoretic measures for anomaly detection , 2001, Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001.

[38]  George M. Mohay,et al.  Mining e-mail content for author identification forensics , 2001, SGMD.