Digital media triage with bulk data analysis and bulk_extractor

Bulk data analysis eschews file extraction and analysis, common in forensic practice today, and instead processes data in "bulk," recognizing and extracting salient details ("features") of use in the typical digital forensics investigation. This article presents the requirements, design and implementation of the bulk_extractor, a high-performance carving and feature extraction tool that uses bulk data analysis to allow the triage and rapid exploitation of digital media. Bulk data analysis and the bulk_extractor are designed to complement traditional forensic approaches, not replace them. The approach and implementation offer several important advances over today's forensic tools, including optimistic decompression of compressed data, context-based stop-lists, and the use of a "forensic path" to document both the physical location and forensic transformations necessary to reconstruct extracted evidence. The bulk_extractor is a stream-based forensic tool, meaning that it scans the entire media from beginning to end without seeking the disk head, and is fully parallelized, allowing it to work at the maximum I/O capabilities of the underlying hardware (provided that the system has sufficient CPU resources). Although bulk_extractor was developed as a research prototype, it has proved useful in actual police investigations, two of which this article recounts.

[1]  Simson L. Garfinkel,et al.  Forensic feature extraction and cross-drive analysis , 2006, Digit. Investig..

[2]  Tianjie Cao,et al.  Collecting Sensitive Information from Windows Physical Memory , 2009, J. Comput..

[3]  C. Barden,et al.  Proficiency Testing Trends Following the 2009 National Academy of Sciences Report, “Strengthening Forensic Science in the United States: A Path Forward” , 2016 .

[4]  David Defour,et al.  Using Graphics Processors for Parallelizing Hash-Based Data Carving , 2009, 2009 42nd Hawaii International Conference on System Sciences.

[5]  Simson L. Garfinkel,et al.  Digital forensics research: The next 10 years , 2010, Digit. Investig..

[6]  Simson L. Garfinkel,et al.  Using purpose-built functions and block hashes to enable small block and sub-file forensics , 2010, Digit. Investig..

[7]  Simson L. Garfinkel,et al.  Carving contiguous and fragmented files with fast object validation , 2007, Digit. Investig..

[8]  Daniel Ayers,et al.  A second generation computer forensic analysis system , 2009, Digit. Investig..

[9]  Law. Policy Executive Summary of the National Academies of Science Reports, Strengthening Forensic Science in the United States: A Path Forward , 2009 .

[10]  Michael Cohen,et al.  PyFlag - An advanced network forensic framework , 2008, Digit. Investig..

[11]  Golden G. Richard,et al.  Next-generation digital forensics , 2006, CACM.

[12]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[13]  Harry Parsonage,et al.  Computer Forensics Case Assessment and Triage - some ideas for discussion , 2010 .

[14]  Michael S. Greenberg,et al.  Network Forensics Analysis , 2002, IEEE Internet Comput..

[15]  Ariel J. Feldman,et al.  Lest we remember: cold-boot attacks on encryption keys , 2008, CACM.

[16]  Chris Buzelli,et al.  Next-Generation DIGITAL FORENSICS , 2006 .

[17]  Simson L. Garfinkel,et al.  Bringing science to digital forensics with standardized forensic corpora , 2009, Digit. Investig..

[18]  Ann C. Smith,et al.  Daubert v. Merrell Dow Pharmaceuticals , 2009 .

[19]  Simson L. Garfinkel,et al.  Forensic carving of network packets and associated data structures , 2011, Digit. Investig..

[20]  Steve Bunting,et al.  EnCase Computer Forensics -- The Official EnCE: EnCase Certified Examiner Study Guide , 2006 .

[21]  Lorie M. Liebrock,et al.  Consideration of Issues for Parallel Digital Forensics of RAID Systems , 2008, J. Digit. Forensic Pract..

[22]  Golden G. Richard,et al.  Scalpel: A Frugal, High Performance File Carver , 2005, DFRWS.

[23]  Simson L. Garfinkel,et al.  Disk Imaging with the Advanced Forensic Format , Library and Tools , 2006 .

[24]  Simson L. Garfinkel,et al.  Automating Disk Forensic Processing with SleuthKit, XML and Python , 2009, 2009 Fourth International IEEE Workshop on Systematic Approaches to Digital Forensic Engineering.

[25]  Peter Deutsch,et al.  DEFLATE Compressed Data Format Specification version 1.3 , 1996, RFC.

[26]  Peter Deutsch,et al.  GZIP file format specification version 4.3 , 1996, RFC.

[27]  Charalampos Konstantopoulos,et al.  Efficient parallel Text Retrieval techniques on Bulk Synchronous Parallel (BSP)/Coarse Grained Multicomputers (CGM) , 2009, The Journal of Supercomputing.

[28]  James R. Lyle If error rate is such a simple concept, why don't I have one for my forensic tool yet? , 2010 .

[29]  Steven Swanson,et al.  Reliably Erasing Data from Flash-Based Solid State Drives , 2011, FAST.

[30]  Stephen Pearson,et al.  Digital Triage Forensics: Processing the Digital Crime Scene , 2010 .

[31]  Nasir Memon,et al.  Identification and recovery of JPEG files with missing fragments , 2009, Digit. Investig..

[32]  Simson L. Garfinkel,et al.  New XML-Based Files Implications for Forensics , 2009, IEEE Security & Privacy.

[33]  Simson L. Garfinkel,et al.  Advanced Forensic Format: An Open, Extensible Format for Disk Imaging , 2006 .

[34]  J. Slay,et al.  Validation and verification of computer forensic software tools - Searching Function , 2009, Digit. Investig..

[35]  Brian Neil Levine,et al.  Forensic Triage for Mobile Phones with DEC0DE , 2011, USENIX Security Symposium.

[36]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[37]  R. M. Bird,et al.  Associative/parallel processors for searching very large textual data bases , 1977, CAW '77.

[38]  Jill Slay,et al.  Validation and verification of computer forensic software tools-Searching Function , 2009 .

[39]  Marcus K. Rogers,et al.  Computer Forensics Field Triage Process Model , 2006, J. Digit. Forensics Secur. Law.