Creating Realistic Corpora for Security and Forensic Education

We present work on the design, implementation, distribution, and use of realistic forensic datasets to support digital forensics and security education. We describe in particular the “M57-Patents” scenario, a multi-modal corpus consisting of hard drive images, RAM images, network captures, and images from other devices typically found in forensics investigations such as USB drives and cellphones. Corpus creation has been performed as part of a scripted scenario; subsequently it is less “noisy” than real-world data but retains the complexity necessary to support a wide variety of forensic education activities. Realistic forensic corpora allow direct comparison of approaches and tools across classrooms and institutions, reduce the time required to prepare useful educational materials, and eliminate concerns of exposing students to privacy-sensitive or illegal digital materials. The “M57Patents” corpus can be freely redistributed without rights-restricted materials, and is available with disk images packaged in both open (Advanced Forensic Format) and commercial (EnCase) formats.

[1]  Simson L. Garfinkel,et al.  Digital forensics research: The next 10 years , 2010, Digit. Investig..

[2]  Eoghan Casey,et al.  Digital Evidence and Computer Crime - Forensic Science, Computers and the Internet, 3rd Edition , 2011 .

[3]  C. Henry,et al.  Council on Library and Information Resources (CLIR) , 2010 .

[4]  Nicole Beebe,et al.  Digital Forensic Research: The Good, the Bad and the Unaddressed , 2009, IFIP Int. Conf. Digital Forensics.

[5]  Michael Cohen,et al.  PyFlag - An advanced network forensic framework , 2008, Digit. Investig..

[6]  Amy S. Jackson Book Review: Digital Forensics and Born-Digital Content in Cultural Heritage Collections , 2011 .

[7]  Christopher L. T. Brown Computer Evidence: Collection and Preservation , 2009 .

[8]  Simson L. Garfinkel,et al.  Bringing science to digital forensics with standardized forensic corpora , 2009, Digit. Investig..

[9]  Bradley L. Schatz,et al.  Extending the advanced forensic format to accommodate multiple data sources, logical evidence, arbitrary information and forensic workflow , 2009, Digit. Investig..

[10]  Simson L. Garfinkel,et al.  Providing Cryptographic Security and Evidentiary Chain-of-Custody with the Advanced Forensic Format, Library, and Tools , 2009, Int. J. Digit. Crime Forensics.

[11]  R. Jones,et al.  Digital Evidence and Computer Crime: Forensic Science, Computers and the Internet , 2003, Int. J. Law Inf. Technol..

[12]  Brian D. Carrier,et al.  File System Forensic Analysis , 2005 .