Effective whitelisting for filesystem forensics

Forensic analysis of the large filesystems commonly found on current computers requires an effective method for categorizing and prioritizing files in order to avoid overwhelming the investigator. A key technique for this purpose is whitelisting files, i.e., skipping the detailed analysis of files that match files in a well known reference collection of files. Effective use of this technique requires an efficient method to match files, detecting not only exact matches, but also near matches or approximate matches. This paper outlines the requirements for such matching, formalizes them as the bounded best match and approximate bounded near-match problems, and describes methods to solve these problems. In particular, the approximate bounded near-match problem is mapped to the problem of finding near neighbors in a high-dimensional metric space and solved using locality-sensitive hashing.

[1]  Daniel Zeng,et al.  How Useful Are Tags? - An Empirical Analysis of Collaborative Tagging for Web Page Recommendation , 2008, ISI Workshops.

[2]  Pavel Zezula,et al.  A cost model for similarity queries in metric spaces , 1998, PODS '98.

[3]  Piotr Indyk,et al.  Similarity Search in High Dimensions via Hashing , 1999, VLDB.

[4]  Shane S. Sturrock,et al.  Time Warps, String Edits, and Macromolecules – The Theory and Practice of Sequence Comparison . David Sankoff and Joseph Kruskal. ISBN 1-57586-217-4. Price £13.95 (US$22·95). , 2000 .

[5]  Marcus K. Rogers,et al.  Hidden Disk Areas: HPA and DCO , 2006, Int. J. Digit. EVid..

[6]  Ronald L. Rivest,et al.  The MD5 Message-Digest Algorithm , 1992, RFC.

[7]  Sheau-Dong Lang,et al.  Forensic Artifacts of Microsoft Windows Vista System , 2008, ISI Workshops.

[8]  Marcus K. Rogers,et al.  iPOD Forensics Update , 2007, Int. J. Digit. EVid..

[9]  Eugene H. Spafford,et al.  Getting Physical with the Digital Investigation Process , 2003, Int. J. Digit. EVid..

[10]  Pengzhu Zhang,et al.  Sequence Matching for Suspicious Activity Detection in Anti-Money Laundering , 2008, ISI Workshops.

[11]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[12]  Andrew Tridgell,et al.  Efficient Algorithms for Sorting and Synchronization , 1999 .

[13]  Paul Mackerras,et al.  The rsync algorithm , 1996 .

[14]  Yixin Chen,et al.  md5bloom: Forensic filesystem hashing revisited , 2006, Digit. Investig..

[15]  Brian D. Carrier,et al.  File System Forensic Analysis , 2005 .

[16]  Steve Mead,et al.  Unique file identification in the National Software Reference Library , 2006, Digit. Investig..