Identification and Localization of Data Types within Large-Scale File Systems

This research examines the application of statistical analysis techniques for the identification of data types embedded within a file to assist analysts with the location of data, potentially relevant to criminal activity. The results show that the statistical analysis can effectively aid identification of the types of data embedded in a file and the approximate location of these data types. This analysis identifies component data types, irrespective of the type of file being analyzed. When applied, this technique will allow analysts to more effectively and efficiently locate relevant data on a hard drive, especially on today's particularly large hard drives

[1]  Sushil Jajodia,et al.  Steganalysis: the investigation of hidden information , 1998, 1998 IEEE Information Technology Conference, Information Environment for the Future (Cat. No.98EX228).

[2]  Mohammad Hossain Heydari,et al.  Content based file type detection algorithms , 2003, 36th Annual Hawaii International Conference on System Sciences, 2003. Proceedings of the.

[3]  M Damashek,et al.  Gauging Similarity with n-Grams: Language-Independent Categorization of Text , 1995, Science.

[4]  Gustavus J. Simmons,et al.  The Prisoners' Problem and the Subliminal Channel , 1983, CRYPTO.

[5]  Warren G. Kruse,et al.  Computer Forensics: Incident Response Essentials , 2001 .

[6]  Nahid Shahmehri,et al.  Oscar - File Type Identification of Binary Data in Disk Clusters and RAM Pages , 2006, SEC.

[7]  Brian D. Carrier,et al.  File System Forensic Analysis , 2005 .

[8]  N. Shahmehri,et al.  File Type Identification of Data Fragments by Their Binary Structure , 2006, 2006 IEEE Information Assurance Workshop.

[9]  Ke Wang,et al.  Fileprints: identifying file types by n-gram analysis , 2005, Proceedings from the Sixth Annual IEEE SMC Information Assurance Workshop.