Benchmarking Lightweight Techniques to Link E-Mails and Source Code

During the evolution of a software system, a large amount of information, which is not always directly related to the source code, is produced. Several researchers have provided evidence that the contents of mailing lists represent a valuable source of information: Through e-mails, developers discuss design decisions, ideas, known problems and bugs, etc. which are otherwise not to be found in the system.A technical challenge in this context is how to establish the missing link between free-form e-mails and the system artifacts they refer to. Although the range of approaches is vast, establishing their accuracy remains a problem, as there is no benchmark against which to compare their performance.To overcome this issue, we manually inspected a statistically significant number of e-mails pertaining to the ArgoUML system. Based on this benchmark, we present a variety of lightweight techniques to assign e-mails to software artifacts and measure their effectiveness in terms of precision and recall.

[1]  M. Kendall Elementary Statistics , 1945, Nature.

[2]  David Yarowsky,et al.  Unsupervised Word Sense Disambiguation Rivaling Supervised Methods , 1995, ACL.

[3]  David Notkin,et al.  Software Reflexion Models: Bridging the Gap between Design and Implementation , 2001, IEEE Trans. Software Eng..

[4]  Serge Demeyer,et al.  FAMIX 2. 1-the FAMOOS information exchange model , 1999 .

[5]  Andrian Marcus,et al.  Recovering documentation-to-source-code traceability links using latent semantic indexing , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[6]  Giuliano Antoniol,et al.  Identifying design-code inconsistencies in object-oriented software: a case study , 1998, Proceedings. International Conference on Software Maintenance (Cat. No. 98CB36272).

[7]  Michele Lanza,et al.  Visualizing Co-Change Information with the Evolution Radar , 2009, IEEE Transactions on Software Engineering.

[8]  Jane Huffman Hayes,et al.  Good Benchmarks are Hard To Find: Toward the Benchmark for Information Retrieval Applications in Software Engineering , 2006 .

[9]  Michael L. Begeman,et al.  gIBIS: a hypertext tool for exploratory policy discussion , 1988, CSCW '88.

[10]  Wei Zhao,et al.  Understanding how the requirements are implemented in source code , 2003, Tenth Asia-Pacific Software Engineering Conference, 2003..

[11]  K. C. Burgess Yakemovic,et al.  Report on a development project use of an issue-based information system , 1990, CSCW '90.

[12]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[13]  Wang Zhi-jian Using Benchmarking to Advance Research:A Challenge to Software Engineering , 2005 .

[14]  Olga Baysal,et al.  Correlating Social Interactions to Release History during Software Evolution , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[15]  Andrea De Lucia,et al.  Working Session: Information Retrieval Based Approaches in Software Evolution , 2006, 2006 22nd IEEE International Conference on Software Maintenance.

[16]  Rick Dewar,et al.  The Ophelia Traceability Layer , 2002 .

[17]  Joseph A. Goguen,et al.  An Object-Oriented Tool for Tracing Requirements , 1996, IEEE Softw..

[18]  Genny Tortora,et al.  Enhancing an artefact management system with traceability recovery features , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[19]  Roy H. Campbell,et al.  Monitoring compliance of a software system with its high-level design models , 1996, Proceedings of IEEE 18th International Conference on Software Engineering.

[20]  Hinrich Schütze,et al.  Introduction to information retrieval , 2008 .

[21]  Vasant Dhar,et al.  Supporting Systems Development by Capturing Deliberations During Requirements Engineering , 1992, IEEE Trans. Software Eng..

[22]  Giuliano Antoniol,et al.  Design‐code traceability for object‐oriented systems , 2000, Ann. Softw. Eng..

[23]  Jane Huffman Hayes,et al.  Advancing candidate link generation for requirements tracing: the study of methods , 2006, IEEE Transactions on Software Engineering.