Effective and efficient detection of software theft via dynamic API authority vectors

We design a novel feature of a program for detecting software theft.We reflect the sequence and the frequency information of a program to our feature.Our proposed method is credible, resilient, and scalable.Our method outperforms existing software theft detection methods in our experiments. Software theft has become a very serious threat to both the software industry and individual software developers. A software birthmark indicates unique characteristics of a program in question, which can be used for analyzing the similarity of a pair of programs and detecting theft. This paper proposes a novel birthmark, a dynamic API authority vector (DAAV). DAAV satisfies four essential requirements for good birthmarkscredibility, resiliency, scalability, and packing-freewhile existing static birthmarks are unable to handle the packed programs and existing dynamic birthmarks do not satisfy credibility and resiliency. Through our extensive experiments with a set of Windows applications, DAAV is shown to have not only the credibility and resiliency higher than the existing dynamic birthmarks but also the accuracy comparable to that of existing static birthmarks. This result indicates that our proposed birthmark provides high accuracy and also covers packed programs successfully in detecting software theft.

[1]  Youngsu Park,et al.  An efficient similarity comparison based on core API calls , 2013, SAC '13.

[2]  David Schuler,et al.  A dynamic birthmark for java , 2007, ASE.

[3]  Christos Faloutsos,et al.  Fast Random Walk with Restart and Its Applications , 2006, Sixth International Conference on Data Mining (ICDM'06).

[4]  Christian S. Collberg,et al.  Detecting Software Theft via Whole Program Path Birthmarks , 2004, ISC.

[5]  David Grove,et al.  Call graph construction in object-oriented languages , 1997, OOPSLA '97.

[6]  Philip S. Yu,et al.  GPLAG: detection of software plagiarism by program dependence graph analysis , 2006, KDD '06.

[7]  Xingming Sun,et al.  A Combined Static and Dynamic Software Birthmark Based on Component Dependence Graph , 2008, 2008 International Conference on Intelligent Information Hiding and Multimedia Signal Processing.

[8]  Akito Monden,et al.  Dynamic Software Birthmarks to Detect the Theft of Windows Applications , 2004 .

[9]  Christian S. Collberg,et al.  Software watermarking: models and dynamic embeddings , 1999, POPL '99.

[10]  Akito Monden,et al.  Design and evaluation of birthmarks for detecting theft of java programs , 2004, IASTED Conf. on Software Engineering.

[11]  Sang-Chul Lee,et al.  Software plagiarism detection via the static API call frequency birthmark , 2013, SAC '13.

[12]  Harish Patil,et al.  Pin: building customized program analysis tools with dynamic instrumentation , 2005, PLDI '05.

[13]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[14]  Sencun Zhu,et al.  Value-based program characterization and its application to software plagiarism detection , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[15]  Qinghua Zheng,et al.  DKISB: Dynamic Key Instruction Sequence Birthmark for Software Plagiarism Detection , 2013, 2013 IEEE 10th International Conference on High Performance Computing and Communications & 2013 IEEE International Conference on Embedded and Ubiquitous Computing.

[16]  Christian S. Collberg,et al.  K-gram based software birthmarks , 2005, SAC '05.

[17]  Michael J. Wise,et al.  YAP3: improved detection of similarities in computer program and other texts , 1996, SIGCSE '96.

[18]  Sencun Zhu,et al.  Behavior based software theft detection , 2009, CCS.

[19]  Christian S. Collberg,et al.  Software theft detection through program identification , 2006 .

[20]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[21]  Eul Gyu Im,et al.  Software plagiarism detection: a graph-based approach , 2013, CIKM.