Approximate Matching: Definition and Terminology

This document provides a definition of and terminology for approximate matching. Approximate matching is a promising technology designed to identify similarities between two digital artifacts. It is used to find objects that resemble each other or to find objects that are contained in another object. This can be very useful for filtering data for security monitoring, digital forensics, or other applications. The purpose of this document is to provide a definition and terminology to describe approximate matching in order to promote discussion, research, tool development and tool acquisition.

[1]  Andrei Z. Broder,et al.  On the resemblance and containment of documents , 1997, Proceedings. Compression and Complexity of SEQUENCES 1997 (Cat. No.97TB100171).

[2]  Harald Baier,et al.  Security Aspects of Piecewise Hashing in Computer Forensics , 2011, 2011 Sixth International Conference on IT Security Incident Management and IT Forensics.

[3]  Vassil Roussev,et al.  An evaluation of forensic similarity hashes , 2011, Digit. Investig..

[4]  Harald Baier,et al.  Towards a Process Model for Hash Functions in Digital Forensics , 2013, ICDF2C.

[5]  Jaroslav Krivánek,et al.  Implementation details , 2007, SIGGRAPH Courses.

[6]  Harald Baier,et al.  FRASH: A framework to test algorithms of similarity hashing , 2013, Digit. Investig..

[7]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..

[8]  Vassil Roussev,et al.  Evaluating detection error trade-offs for bytewise approximate matching algorithms , 2014, Digit. Investig..