A Method for Locating Digital Evidences with Outlier Detection Using Support Vector Machine

One of the biggest challenges facing digital investigators is the sheer volume of data that must be searched in locating the digital evidence. How to efficiently locate the evidence relating to the computer crime while maintaining accuracy is becoming a research focus. In this paper, we introduce a two-tier method to automate the process of locating the digital evidence, which first employ a one-class Support Vector Machine (SVM) outlier detector to filter out insignificant records for forensic investigators and then use a group of one-class SVM classifiers (trained with the expert knowledge or interested samples for an investigator based on a different feature vector) to further analyze the output of the outlier detector to improve the accuracy of investigation. The effectiveness of the proposed method for locating digital evidence is demonstrated using the public datasets: KDD Cup99 (Knowledge Discovery and Data-mining) intrusion detection dataset.

[1]  Bernhard Schölkopf,et al.  Estimating the Support of a High-Dimensional Distribution , 2001, Neural Computation.

[2]  P.J.W. Rayner,et al.  Optimized support vector machines for nonstationary signal classification , 2002, IEEE Signal Processing Letters.

[3]  Anthony Widjaja,et al.  Learning with Kernels: Support Vector Machines, Regularization, Optimization, and Beyond , 2003, IEEE Transactions on Neural Networks.

[4]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[5]  Manuel Davy,et al.  Support vector-based online detection of abrupt changes , 2003, 2003 IEEE International Conference on Acoustics, Speech, and Signal Processing, 2003. Proceedings. (ICASSP '03)..

[6]  Brian D. Carrier,et al.  File System Forensic Analysis , 2005 .

[7]  Eugene H. Spafford,et al.  Automated Digital Evidence Target Definition Using Outlier Analysis and Existing Evidence , 2005, DFRWS.

[8]  Salvatore J. Stolfo,et al.  Cost-based modeling for fraud and intrusion detection: results from the JAM project , 2000, Proceedings DARPA Information Survivability Conference and Exposition. DISCEX'00.

[9]  Colin Campbell,et al.  A Linear Programming Approach to Novelty Detection , 2000, NIPS.

[10]  Tony R. Martinez,et al.  Improved Heterogeneous Distance Functions , 1996, J. Artif. Intell. Res..

[11]  Brian D. Carrier,et al.  Defining event reconstruction of digital crime scenes. , 2004, Journal of forensic sciences.

[12]  Tom Fawcett,et al.  Robust Classification for Imprecise Environments , 2000, Machine Learning.

[13]  Wenke Lee,et al.  Cost-based Modeling and Evaluation for Data Mining With Application to Fraud and Intrusion Detection : Results from the JAM Project ∗ , 2008 .

[14]  Andrew H. Sung,et al.  Identifying Significant Features for Network Forensic Analysis Using Artificial Intelligence Techniques , 2003, Int. J. Digit. EVid..

[15]  Salvatore J. Stolfo,et al.  Data Mining Approaches for Intrusion Detection , 1998, USENIX Security Symposium.