Retrieving similar code fragments based on identifier similarity for defect detection

Similar source code fragments, known as code clones, may involve similar defects caused by the same mistake. However, code clone detection tools cannot detect certain code fragments (e.g. modified after copy-and-pasted). To support developers who would like to detect such defects, we propose a method to retrieve similar code fragments in source code based on the similarity of identifiers between a query and a target code fragment. We present two case studies of similar defects in open source systems.

[1]  Richard Levins,et al.  Why Programs Fail , 2010 .

[2]  Ettore Merlo,et al.  Assessing the benefits of incorporating function clone detection in a development process , 1997, 1997 Proceedings International Conference on Software Maintenance.

[3]  Brenda S. Baker,et al.  Finding Clones with Dup: Analysis of an Experiment , 2007, IEEE Transactions on Software Engineering.

[4]  Miryung Kim,et al.  An ethnographic study of copy and paste programming practices in OOPL , 2004, Proceedings. 2004 International Symposium on Empirical Software Engineering, 2004. ISESE '04..

[5]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..