A fuzzy hashing technique for large scale software birthmarks

Software birthmarks have been proposed as a method for enabling the detection of programs that may have been stolen by measuring the similarity between the two programs. A birthmark is created from each program by extracting its native characteristics. The birthmarks of the programs can then be compared. However, because the extracted birthmarks contain a large amount of information, a large amount of time is needed when using them to compare large programs. This paper describes our work to reduce this comparison time. Achieving faster comparisons will enable the evaluation of large programs and simplify the use of birthmarks. Specifically, our method creates hashes from conventional birthmark information using fuzzy hashing, and then measures the similarity of the programs using the obtained hash values. Using the proposed method, we achieved a major speed increase over the conventional birthmark method with distinction rates of over 90%. On the other hand, because preservation performance decreased substantially, the similarity threshold value needed to be lowered when using the proposed method.

[1]  Hyun-il Lim,et al.  Detecting Theft of Java Applications via a Static Birthmark Based on Weighted Stack Patterns , 2008, IEICE Trans. Inf. Syst..

[2]  Christian S. Collberg,et al.  K-gram based software birthmarks , 2005, SAC '05.

[3]  Vladimir I. Levenshtein,et al.  Binary codes capable of correcting deletions, insertions, and reversals , 1965 .

[4]  Collin McMillan,et al.  Detecting similar software applications , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[5]  Hyun-il Lim,et al.  A static API birthmark for Windows binary executables , 2009, J. Syst. Softw..

[6]  Akito Monden,et al.  Java Birthmarks - Detecting the Software Theft - , 2005, IEICE Trans. Inf. Syst..

[7]  Sencun Zhu,et al.  Value-based program characterization and its application to software plagiarism detection , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[8]  David Schuler,et al.  A dynamic birthmark for java , 2007, ASE.

[9]  Akito Monden,et al.  Design and evaluation of birthmarks for detecting theft of java programs , 2004, IASTED Conf. on Software Engineering.

[10]  Siu-Ming Yiu,et al.  Heap Graph Based Software Theft Detection , 2013, IEEE Transactions on Information Forensics and Security.

[11]  Hyun-il Lim,et al.  A Static Java Birthmark Based on Operand Stack Behaviors , 2008, 2008 International Conference on Information Security and Assurance (isa 2008).

[12]  Jesse D. Kornblum Identifying almost identical files using context triggered piecewise hashing , 2006, Digit. Investig..