TextRank based search term identification for software change tasks

During maintenance, software developers deal with a number of software change requests. Each of those requests is generally written using natural language texts, and it involves one or more domain related concepts. A developer needs to map those concepts to exact source code locations within the project in order to implement the requested change. This mapping generally starts with a search within the project that requires one or more suitable search terms. Studies suggest that the developers often perform poorly in coming up with good search terms for a change task. In this paper, we propose and evaluate a novel TextRank-based technique that automatically identifies and suggests search terms for a software change task by analyzing its task description. Experiments with 349 change tasks from two subject systems and comparison with one of the latest and closely related state-of-the-art approaches show that our technique is highly promising in terms of suggestion accuracy, mean average precision and recall.

[1]  Rada Mihalcea,et al.  TextRank: Bringing Order into Text , 2004, EMNLP.

[2]  Emily Hill,et al.  Using natural language program analysis to locate and understand action-oriented concerns , 2007, AOSD.

[3]  Michele Lanza,et al.  Seahawk: Stack Overflow in the IDE , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[4]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[5]  Thomas Fritz,et al.  Automatic search term identification for change tasks , 2014, ICSE Companion.

[6]  Thomas Fritz,et al.  A dictionary to translate change tasks to source code , 2014, MSR 2014.

[7]  Sonia Haiduc,et al.  Automatically detecting the quality of the query and its implications in IR-based concept location , 2011, 2011 26th IEEE/ACM International Conference on Automated Software Engineering (ASE 2011).

[8]  Gabriele Bavota,et al.  Evaluating the specificity of text retrieval queries to support software engineering tasks , 2012, 2012 34th International Conference on Software Engineering (ICSE).

[9]  Lori L. Pollock,et al.  Automatically mining software-based, semantically-similar words from comment-code mappings , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[10]  Christina Lioma,et al.  Graph-based term weighting for information retrieval , 2011, Information Retrieval.

[11]  Andrian Marcus,et al.  Text Retrieval Approaches for Concept Location in Source Code , 2011, ISSSE.

[12]  Gabriele Bavota,et al.  Automatic query reformulations for text retrieval in software engineering , 2013, 2013 35th International Conference on Software Engineering (ICSE).

[13]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[14]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[15]  Tim Menzies,et al.  On the use of relevance feedback in IR-based concept location , 2009, 2009 IEEE International Conference on Software Maintenance.

[16]  Andrian Marcus,et al.  On the Effect of the Query in IR-based Concept Location , 2011, 2011 IEEE 19th International Conference on Program Comprehension.