Towards clone-and-own support: locating relevant methods in legacy products

Clone-and-Own (CAO) is a common practice in families of software products consisting of reusing code from methods in legacy products in new developments. In industrial scenarios, CAO consumes high amounts of time and effort without guaranteeing good results. We propose a novel approach, Computer Assisted CAO (CACAO), that given the natural language requirements of a new product, and the legacy products from that family, ranks the legacy methods in the family for each of the new product requirements according to their relevancy to the new development. We evaluated our approach in the industrial domain of train control software. Without CACAO, software engineers tasked with the development of a new product had to manually review a total of 2200 methods in the family. Results show that CACAO can reduce the number of methods to be reviewed, and guide software engineers towards the identification of relevant legacy methods to be reused in the new product.

[1]  Marsha Chechik,et al.  A Survey of Feature Location Techniques , 2013, Domain Engineering, Product Lines, Languages, and Conceptual Models.

[2]  Per Runeson,et al.  Guidelines for conducting and reporting case study research in software engineering , 2009, Empirical Software Engineering.

[3]  Feifan Liu,et al.  Unsupervised Approaches for Automatic Keyword Extraction Using Meeting Transcripts , 2009, NAACL.

[4]  Martin Porter,et al.  Snowball: A language for stemming algorithms , 2001 .

[5]  Krzysztof Czarnecki,et al.  Feature Diagrams and Logics: There and Back Again , 2007, 11th International Software Product Line Conference (SPLC 2007).

[6]  Shinji Kusumoto,et al.  CCFinder: A Multilinguistic Token-Based Code Clone Detection System for Large Scale Source Code , 2002, IEEE Trans. Software Eng..

[7]  Zarinah Mohd Kasirun,et al.  Feature extraction approaches from natural language requirements for reuse in software product lines: A systematic literature review , 2015, J. Syst. Softw..

[8]  Iadh Ounis,et al.  Automatically Building a Stopword List for an Information Retrieval System , 2005, J. Digit. Inf. Manag..

[9]  Yann-Gaël Guéhéneuc,et al.  Feature Location Using Probabilistic Ranking of Methods Based on Execution Scenarios and Information Retrieval , 2007, IEEE Transactions on Software Engineering.

[10]  Krzysztof Czarnecki,et al.  An Exploratory Study of Cloning in Industrial Software Product Lines , 2013, 2013 17th European Conference on Software Maintenance and Reengineering.

[11]  Yuanyuan Zhou,et al.  CP-Miner: finding copy-paste and related bugs in large-scale software code , 2006, IEEE Transactions on Software Engineering.

[12]  Felice Dell'Orletta,et al.  Mining commonalities and variabilities from natural language documents , 2013, SPLC '13.

[13]  Jane Cleland-Huang,et al.  On-demand feature recommendations derived from mining public product descriptions , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[14]  Harith Alani,et al.  On Stopwords, Filtering and Data Sparsity for Sentiment Analysis of Twitter , 2014, LREC.

[15]  Patrick F. Reidy An Introduction to Latent Semantic Analysis , 2009 .

[16]  Krzysztof Czarnecki,et al.  Mining configuration constraints: static analyses and empirical results , 2014, ICSE.

[17]  Andrew David Eisenberg,et al.  Dynamic feature traces: finding features in unfamiliar code , 2005, 21st IEEE International Conference on Software Maintenance (ICSM'05).

[18]  Ralf Lämmel,et al.  Flexible product line engineering with a virtual platform , 2014, ICSE Companion.

[19]  Krzysztof Czarnecki,et al.  Reverse engineering feature models , 2011, 2011 33rd International Conference on Software Engineering (ICSE).

[20]  Hoan Anh Nguyen,et al.  Complete and accurate clone detection in graph-based models , 2009, 2009 IEEE 31st International Conference on Software Engineering.

[21]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[22]  Anette Hulth,et al.  Improved Automatic Keyword Extraction Given More Linguistic Knowledge , 2003, EMNLP.

[23]  Sebastian Erdweg,et al.  Variability-aware parsing in the presence of lexical macros and conditional compilation , 2011, OOPSLA '11.

[24]  Denys Poshyvanyk,et al.  Feature location via information retrieval based filtering of a single scenario execution trace , 2007, ASE.

[25]  Abdelhak-Djamel Seriai,et al.  Feature Location in a Collection of Product Variants: Combining Information Retrieval and Hierarchical Clustering , 2014, SEKE.

[26]  Bernardete Ribeiro,et al.  The importance of stop word removal on recall values in text categorization , 2003, Proceedings of the International Joint Conference on Neural Networks, 2003..

[27]  Claes Wohlin,et al.  Experimentation in Software Engineering , 2000, The Kluwer International Series in Software Engineering.

[28]  Harith Alani,et al.  Semantic Patterns for Sentiment Analysis of Twitter , 2014, SEMWEB.

[29]  Marsha Chechik,et al.  A framework for managing cloned product variants , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[30]  Zhenchang Xing,et al.  Feature Location in a Collection of Product Variants , 2012, 2012 19th Working Conference on Reverse Engineering.

[31]  Marsha Chechik,et al.  Managing forked product variants , 2012, SPLC '12.

[32]  Christoph Pohl,et al.  An Exploratory Study of Information Retrieval Techniques in Domain Analysis , 2008, 2008 12th International Software Product Line Conference.