论文信息 - CLCMiner: Detecting Cross-Language Clones without Intermediates

CLCMiner: Detecting Cross-Language Clones without Intermediates

SUMMARY The proliferation of diverse kinds of programming lan- guages and platforms makes it a common need to have the same functionality implemented in di ﬀ erent languages for di ﬀ erent platforms, such as Java for Android applications and C# for Windows phone applications. Although versions of code written in di ﬀ erent languages appear syntactically quite di ﬀ erent from each other, they are intended to implement the same software and typically contain many code snippets that implement similar functionalities, which we call cross-language clones . When the version of code in one language evolves according to changing functionality require- ments and / or bug ﬁxes, its cross-language clones may also need be changed to maintain consistent implementations for the same functionality. Thus, it is needed to have automated ways to locate and track cross-language clones within the evolving software. In the literature, approaches for de- tecting cross-language clones are only for languages that share a common intermediate language (such as the .NET language family) because they are built on techniques for detecting single-language clones. To extend the capability of cross-language clone detection to more diverse kinds of lan- guages, we propose a novel automated approach, CLCMiner , without the need of an intermediate language. It mines such clones from revision his- tories, based on our assumption that revisions to di ﬀ erent versions of code implemented in di ﬀ erent languages may naturally reﬂect how programmers change cross-language clones in practice, and that similarities among the revisions (referred to as clones in di ﬀ s or di ﬀ clones ) may indicate actual similar code. We have implemented a prototype and applied it to ten open source projects implementations in both Java and C#. The reported clones that occur in revision histories are of high precisions (89% on average) and recalls (95% on average). Compared with token-based code clone detec- tion tools that can treat code as plain texts, our tool can detect signiﬁcantly more cross-language clones. All the evaluation results demonstrate the fea- sibility of revision-history based techniques for detecting cross-language clones without intermediates and point to promising future work.

[1] Siau-Cheng Khoo,et al. Predicting Consistent Clone Change , 2016, 2016 IEEE 27th International Symposium on Software Reliability Engineering (ISSRE).

[2] Jianjun Zhao,et al. Mining revision histories to detect cross-language clones without intermediates , 2016, 2016 31st IEEE/ACM International Conference on Automated Software Engineering (ASE).

[3] Katsuro Inoue,et al. Towards Detection and Analysis of Interlanguage Clones for Multilingual Web Applications , 2016, 2016 IEEE 23rd International Conference on Software Analysis, Evolution, and Reengineering (SANER).

[4] Zhi Jin,et al. Building Program Vector Representations for Deep Learning , 2014, KSEM.

[5] Shane McIntosh,et al. Mining Co-change Information to Understand When Build Changes Are Necessary , 2014, 2014 IEEE International Conference on Software Maintenance and Evolution.

[6] Andrea De Lucia,et al. Labeling source code with information retrieval methods: an empirical study , 2013, Empirical Software Engineering.

[7] David Lo,et al. Understanding Widespread Changes: A Taxonomic Study , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[8] Chanchal Kumar Roy,et al. Detecting Clones Across Microsoft .NET Programming Languages , 2012, 2012 19th Working Conference on Reverse Engineering.

[9] Katsuro Inoue,et al. Extracting code clones for refactoring using combinations of clone metrics , 2011, IWSC '11.

[10] Zhendong Su,et al. Automatic mining of functionally equivalent code fragments via random testing , 2009, ISSTA.

[11] Elmar Jürgens,et al. CloneDetective - A workbench for clone detection research , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[12] Daniel M. Germán,et al. The promises and perils of mining git , 2009, 2009 6th IEEE International Working Conference on Mining Software Repositories.

[13] Michael W. Godfrey,et al. “Cloning considered harmful” considered harmful: patterns of cloning in software , 2008, Empirical Software Engineering.

[14] Nicholas A. Kraft,et al. Cross-language Clone Detection , 2008, SEKE.

[15] Jens Krinke,et al. A Study of Consistent and Inconsistent Changes to Code Clones , 2007, 14th Working Conference on Reverse Engineering (WCRE 2007).

[16] Stéphane Ducasse,et al. Using concept analysis to detect co-change patterns , 2007, IWPSE '07.

[17] Giuliano Antoniol,et al. Comparison and Evaluation of Clone Detection Tools , 2007, IEEE Transactions on Software Engineering.

[18] Zhendong Su,et al. DECKARD: Scalable and Accurate Tree-Based Detection of Code Clones , 2007, 29th International Conference on Software Engineering (ICSE'07).

[19] Stéphane Ducasse,et al. Semantic clustering: Identifying topics in source code , 2007, Inf. Softw. Technol..

[20] Chanchal K. Roy,et al. A Survey on Software Clone Detection Research , 2007 .

[21] Kenny Wong,et al. Comprehension and Maintenance of Large-Scale Multi-Language Software Applications , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[22] Miryung Kim,et al. An empirical study of code clone genealogies , 2005, ESEC/FSE-13.

[23] Andreas Zeller,et al. Mining Version Histories to Guide Software Changes , 2004 .

[24] Shinji Kusumoto,et al. CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[25] Andrian Marcus,et al. Identification of high-level concept clones in source code , 2001, Proceedings 16th Annual International Conference on Automated Software Engineering (ASE 2001).

[26] Václav Rajlich,et al. Removing clones from the code , 1999, J. Softw. Maintenance Res. Pract..

[27] Zellig S. Harris,et al. Distributional Structure , 1954 .