Enhancing Digital Forensic Analysis Using Memetic Algorithm Feature Selection Method for Document Clustering

Text clustering is an effective way that helps crime investigation through grouping of crime-related documents. This paper proposes a Memetic Algorithm Feature Selection (MAFS) approach to enhance the performance of document clustering algorithms used to partition crime reports and criminal news as well as some benchmark text datasets. Two clustering algorithms have been selected to demonstrate the effectiveness of the proposed MAFS method; they are the k-means and Spherical k-means (Spk). The reason behind using these clustering methods is to observe the performance of these algorithms before and after applying a hybrid FS that uses a Memetic scheme. The proposed MAFS method combines a Genetic Algorithm-based wrapper FS with the Relief-F filter. The performance evaluation was based on the clustering outcomes before and after applying the proposed MAFS method. The test results showed that the performance of both k-means and spk improved after the MAFS.

[1]  Eduardo R. Hruschka,et al.  Document Clustering for Forensic Analysis: An Approach for Improving Computer Inspection , 2013, IEEE Transactions on Information Forensics and Security.

[2]  Pablo Moscato,et al.  On Evolution, Search, Optimization, Genetic Algorithms and Martial Arts : Towards Memetic Algorithms , 1989 .

[3]  Dalila Boughaci,et al.  A memetic algorithm with support vector machine for feature selection and classification , 2015, Memetic Computing.

[4]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[5]  Lailatul Qadri Zakaria,et al.  An Intelligent Document Clustering Approach to Detect Crime Patterns , 2013 .

[6]  Jan H. P. Eloff,et al.  Integrated digital forensic process model , 2013, Comput. Secur..

[7]  Kim-Kwang Raymond Choo,et al.  Impacts of increasing volume of digital forensic data: A survey and future research challenges , 2014, Digit. Investig..

[8]  Nazlena Mohamad Ali,et al.  Optimal initial centroid in k-means for crime topic , 2012 .

[9]  Pu-yan Nie,et al.  A filter method for solving nonlinear complementarity problems , 2005, Appl. Math. Comput..

[10]  Samah Jamal Fodeh,et al.  On ontology-driven document clustering using core semantic features , 2011, Knowledge and Information Systems.

[11]  Jerffeson Teixeira de Souza,et al.  Feature selection with a general hybrid algorithm , 2004 .

[12]  J. A. Hartigan,et al.  A k-means clustering algorithm , 1979 .

[13]  Mohamed S. Kamel,et al.  Text Classification Using Small Number of Features , 2005, MLDM.

[14]  Edward R. Dougherty,et al.  Performance of feature-selection methods in the classification of high-dimension data , 2009, Pattern Recognit..

[15]  Eoghan Casey Bs Ma Digital Evidence and Computer Crime: Forensic Science, Computers, and the Internet , 2000 .

[16]  Dae-Won Kim,et al.  Memetic feature selection algorithm for multi-label classification , 2015, Inf. Sci..

[17]  Fabrizio Sebastiani,et al.  An Analysis of the Relative Hardness of Reuters-21578 Subsets , 2003 .

[18]  Kevin Kok Wai Wong,et al.  Classification of adaptive memetic algorithms: a comparative study , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[19]  Greg Gogolin The Digital Crime Tsunami , 2010, Digit. Investig..

[20]  Elizabeth León Guzman,et al.  Web document clustering based on a new niching Memetic Algorithm, Term-Document Matrix and Bayesian Information Criterion , 2010, IEEE Congress on Evolutionary Computation.

[21]  David A. Hull Stemming algorithms: a case study for detailed evaluation , 1996 .

[22]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[23]  Kurt Hornik,et al.  Spherical k-Means Clustering , 2012 .

[24]  Richard Weber,et al.  A wrapper method for feature selection using Support Vector Machines , 2009, Inf. Sci..

[25]  Pablo A. Estévez,et al.  A review of feature selection methods based on mutual information , 2013, Neural Computing and Applications.

[26]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).