Text Clustering for Digital Forensics Analysis

In the last decades digital forensics have become a prominent activity in modern investigations. Indeed, an important data source is often constituted by information contained in devices on which investigational activity is performed. Due to the complexity of this inquiring activity, the digital tools used for investigation constitute a central concern. In this paper a clustering-based text mining technique is introduced for investigational purposes. The proposed methodology is experimentally applied to the publicly available Enron dataset that well fits a plausible forensics analysis context.

[1]  Jeffrey W. Seifert Data Mining and Homeland Security: An Overview , 2008 .

[2]  Nello Cristianini,et al.  Kernel Methods for Pattern Analysis , 2003, ICTAI.

[3]  Mohamed S. Kamel,et al.  Efficient phrase-based document indexing for Web document clustering , 2004, IEEE Transactions on Knowledge and Data Engineering.

[4]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[5]  Weiguo Fan,et al.  Tapping the power of text mining , 2006, CACM.

[6]  Michael K. Ng,et al.  An Entropy Weighting k-Means Algorithm for Subspace Clustering of High-Dimensional Sparse Data , 2007, IEEE Transactions on Knowledge and Data Engineering.

[7]  Xiaotie Deng,et al.  Efficient Phrase-Based Document Similarity for Clustering , 2008, IEEE Transactions on Knowledge and Data Engineering.

[8]  Judith Redi,et al.  Hypermetric k-Means Clustering for Content-Based Document Management , 2008, CISIS.

[9]  Anil K. Jain,et al.  Algorithms for Clustering Data , 1988 .

[10]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[11]  Authors' Biographies , 2005 .

[12]  M. B. Mukasey,et al.  Electronic Crime Scene Investigation: A Guide for First Responders, Second Edition , 2008 .

[13]  Andrew McCallum,et al.  Automatic Categorization of Email into Folders: Benchmark Experiments on Enron and SRI Corpora , 2005 .

[14]  Shyi-Ming Chen,et al.  A new method for fuzzy information retrieval based on fuzzy hierarchical clustering and fuzzy inference techniques , 2005, IEEE Transactions on Fuzzy Systems.

[15]  Gang Wang,et al.  Crime data mining: a general framework and some examples , 2004, Computer.

[16]  Jesus Mena,et al.  Investigative Data Mining for Security and Criminal Detection , 2002 .

[17]  Michael W. Berry,et al.  Survey of Text Mining , 2003, Springer New York.

[18]  Judith Redi,et al.  A Text Clustering Framework for Information Retrieval , 2009 .

[19]  Leon G. Higley,et al.  Forensic Entomology: An Introduction , 2009 .

[20]  Vladimir Cherkassky,et al.  The Nature Of Statistical Learning Theory , 1997, IEEE Trans. Neural Networks.

[21]  Brian D. Carrier,et al.  File System Forensic Analysis , 2005 .

[22]  Ted E. Senator,et al.  Countering terrorism through information technology , 2004, CACM.

[23]  Michael W. Berry,et al.  Document clustering using nonnegative matrix factorization , 2006, Inf. Process. Manag..

[24]  Mark A. Girolami,et al.  Mercer kernel-based clustering in feature space , 2002, IEEE Trans. Neural Networks.

[25]  Susan T. Dumais,et al.  Using Linear Algebra for Intelligent Information Retrieval , 1995, SIAM Rev..

[26]  Jiawei Han,et al.  Document clustering using locality preserving indexing , 2005, IEEE Transactions on Knowledge and Data Engineering.

[27]  William-Chandra Tjhi,et al.  A heuristic-based fuzzy co-clustering algorithm for categorization of high-dimensional data , 2008, Fuzzy Sets Syst..

[28]  Heikki Mannila,et al.  Principles of Data Mining , 2001, Undergraduate Topics in Computer Science.

[29]  Sandro Ridella,et al.  Plastic algorithm for adaptive vector quantisation , 1998, Neural Computing & Applications.

[30]  Oren Etzioni,et al.  Grouper: A Dynamic Clustering Interface to Web Search Results , 1999, Comput. Networks.

[31]  Michael W. Berry,et al.  Survey of Text Mining II , 2008 .

[32]  Robert M. Gray,et al.  An Algorithm for Vector Quantizer Design , 1980, IEEE Trans. Commun..

[33]  Andreas Hotho,et al.  A Brief Survey of Text Mining , 2005, LDV Forum.