Customizing k-Gram Based Birthmark through Partial Matching in Detecting Software Thefts

The k-gram based birthmark is a method for comparing binary programs to find similar software, such as software thefts or common modules. The method directly compares opcode sequences, so it is susceptible to program changes, such as optimization or obfuscation. In this paper, we present a method for customizing the k-gram birthmark to allow slight changes of programs by employing partial matching of k-grams. We find the customized k-gram birthmark in Java application environments, and evaluate the customized birthmark in real-world Java applications. In the experimental results, we show that customization of k-gram birthmark improves the credibility and resilience in comparing binary programs.

[1]  Fenlin Liu,et al.  A Software Birthmark Based on Dynamic Opcode n-gram , 2007 .

[2]  Hyun-il Lim,et al.  A method for detecting the theft of Java programs through analysis of the control flow information , 2009, Inf. Softw. Technol..

[3]  Christian S. Collberg,et al.  K-gram based software birthmarks , 2005, SAC '05.

[4]  Akito Monden,et al.  Java Birthmarks - Detecting the Software Theft - , 2005, IEICE Trans. Inf. Syst..

[5]  Michael Philippsen,et al.  Finding Plagiarisms among a Set of Programs with JPlag , 2002, J. Univers. Comput. Sci..

[6]  Christian S. Collberg,et al.  Software theft detection through program identification , 2006 .

[7]  Mark Goadrich,et al.  The relationship between Precision-Recall and ROC curves , 2006, ICML.

[8]  K.W. Bowyer,et al.  Experience using "MOSS" to detect cheating on programming assignments , 1999, FIE'99 Frontiers in Education. 29th Annual Frontiers in Education Conference. Designing the Future of Science and Engineering Education. Conference Proceedings (IEEE Cat. No.99CH37011.

[9]  Fenlin Liu,et al.  A software birthmark based on weighted k-gram , 2010, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[10]  Costas S. Iliopoulos,et al.  Algorithms for computing variants of the longest common subsequence problem , 2008, Theor. Comput. Sci..

[11]  Nicholas Tran,et al.  Sim: a utility for detecting similarity in computer programs , 1999, SIGCSE '99.