Finding similar defects using synonymous identifier retrieval

When we encounter a defect in one part of a program, it is very important to find other parts of the program that may contain similar defects. In this paper, we propose a novel system to find similar defects in the large collection of source code. This system takes a code fragment containing a defect as the query input, and returns code fragments containing the same or synonymous identifiers which appear in the input fragment. Case studies with two open source systems and their defect data show the advantages of the proposed retrieval system, compared to the code-clone based retrievals.

[1]  Susan Horwitz,et al.  Using Slicing to Identify Duplication in Source Code , 2001, SAS.

[2]  Zhendong Su,et al.  DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[3]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[4]  Ido Dagan,et al.  Similarity-Based Models of Word Cooccurrence Probabilities , 1998, Machine Learning.

[5]  Hajimu Iida,et al.  SHINOBI: A real-time code clone detection tool for software maintenance , 2008 .

[6]  Jens Krinke,et al.  Identifying similar code with program dependence graphs , 2001, Proceedings Eighth Working Conference on Reverse Engineering.

[7]  Shinji Kusumoto,et al.  Simultaneous Modification Support based on Code Clone Analysis , 2007, 14th Asia-Pacific Software Engineering Conference (APSEC'07).

[8]  Richard Levins,et al.  Why Programs Fail , 2010 .

[9]  Nicole Immorlica,et al.  Locality-sensitive hashing scheme based on p-stable distributions , 2004, SCG '04.

[10]  Zhendong Su,et al.  Scalable detection of semantic clones , 2008, 2008 ACM/IEEE 30th International Conference on Software Engineering.

[11]  Ettore Merlo,et al.  Experiment on the automatic detection of function clones in a software system using metrics , 1996, 1996 Proceedings of International Conference on Software Maintenance.

[12]  Jianhua Lin,et al.  Divergence measures based on the Shannon entropy , 1991, IEEE Trans. Inf. Theory.

[13]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[14]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[15]  Zhenmin Li,et al.  PR-Miner: automatically extracting implicit programming rules and detecting violations in large software code , 2005, ESEC/FSE-13.

[16]  Stéphane Ducasse,et al.  A language independent approach for detecting duplicated code , 1999, Proceedings IEEE International Conference on Software Maintenance - 1999 (ICSM'99). 'Software Maintenance for Business Change' (Cat. No.99CB36360).

[17]  Brenda S. Baker,et al.  Finding Clones with Dup: Analysis of an Experiment , 2007, IEEE Transactions on Software Engineering.