Exploring a Bayesian and linear approach to requirements traceability

Context: For large software projects it is important to have some traceability between artefacts from different phases (e.g.requirements, designs, code), and between artefacts and the involved developers. However, if the capturing of traceability information during the project is felt as laborious to developers, they will often be sloppy in registering the relevant traceability links so that the information is incomplete. This makes automated tool-based collection of traceability links a tempting alternative, but this has the opposite challenge of generating too many potential trace relationships, not all of which are equally relevant. Objective: This paper evaluates how to rank such auto-generated trace relationships. Method: We present two approaches for such a ranking: a Bayesian technique and a linear inference technique. Both techniques depend on the interaction event trails left behind by collaborating developers while working within a development tool. Results: The outcome of a preliminary study suggest the advantage of the linear approach, we also explore the challenges and potentials of the two techniques. Conclusion: The advantage of the two techniques is that they can be used to provide traceability insights that are contextual and would have been much more difficult to capture manually. We also present some key lessons learnt during this research.

[1]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[2]  Carl Gutwin,et al.  Workspace Awareness in Real-Time Distributed Groupware: Framework, Widgets, and Evaluation , 1996, BCS HCI.

[3]  Harald C. Gall,et al.  CVS release history data for detecting logical couplings , 2003, Sixth International Workshop on Principles of Software Evolution, 2003. Proceedings..

[4]  K. Koch Introduction to Bayesian Statistics , 2007 .

[5]  Denys Poshyvanyk,et al.  Who can help me with this change request? , 2009, 2009 IEEE 17th International Conference on Program Comprehension.

[6]  Walter F. Tichy,et al.  Proceedings 25th International Conference on Software Engineering , 2003, 25th International Conference on Software Engineering, 2003. Proceedings..

[7]  Nevin Lianwen Zhang,et al.  Exploiting Causal Independence in Bayesian Network Inference , 1996, J. Artif. Intell. Res..

[8]  Olly Gotel,et al.  An analysis of the requirements traceability problem , 1994, Proceedings of IEEE International Conference on Requirements Engineering.

[9]  Rob Procter,et al.  Supporting informality: team working and integrated care records , 2004, CSCW.

[10]  Harold Thimbleby,et al.  Proceedings of HCI on People and Computers XII , 1997 .

[11]  Denys Poshyvanyk,et al.  Using Traceability Links to Assess and Maintain the Quality of Software Documentation , 2007 .

[12]  Dolores R. Wallace,et al.  Structured Testing: A Testing Methodology Using the Cyclomatic Complexity Metric , 1996 .

[13]  Giuliano Antoniol,et al.  Traceability recovery by modeling programmer behavior , 2000, Proceedings Seventh Working Conference on Reverse Engineering.

[14]  Gail C. Murphy,et al.  Recommending Emergent Teams , 2007, Fourth International Workshop on Mining Software Repositories (MSR'07:ICSE Workshops 2007).

[15]  John K Kruschke,et al.  Bayesian data analysis. , 2010, Wiley interdisciplinary reviews. Cognitive science.

[16]  I. J. Myung,et al.  Tutorial on maximum likelihood estimation , 2003 .

[17]  Rina Dechter Bucket elimination: a unifying framework for processing hard and soft constraints , 1996, CSUR.

[18]  Genny Tortora,et al.  Assessing IR-based traceability recovery tools through controlled experiments , 2009, Empirical Software Engineering.

[19]  Carl Gutwin,et al.  Group awareness in distributed software development , 2004, CSCW.

[20]  F. Cozman,et al.  Generalizing variable elimination in Bayesian networks , 2000 .

[21]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[22]  Lionel C. Briand,et al.  Automated traceability analysis for UML model refinements , 2009, Inf. Softw. Technol..

[23]  J. Herbsleb,et al.  Two case studies of open source software development: Apache and Mozilla , 2002, TSEM.

[24]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[25]  Audris Mockus,et al.  Expertise Browser: a quantitative approach to identifying expertise , 2002, Proceedings of the 24th International Conference on Software Engineering. ICSE 2002.

[26]  Michael W. Godfrey,et al.  An Industrial Case Study of Program Artifacts Viewed During Maintenance Tasks , 2006, 2006 13th Working Conference on Reverse Engineering.

[27]  Bente Anda,et al.  Experiences from introducing UML-based development in a large safety-critical project , 2006, Empirical Software Engineering.

[28]  Dewayne E. Perry,et al.  Recovering and using use-case-diagram-to-source-code traceability links , 2007, ESEC-FSE '07.

[29]  Gail C. Murphy,et al.  Who should fix this bug? , 2006, ICSE.

[30]  Mik Kersten Focusing knowledge work with task context , 2007 .

[31]  Mark S. Ackerman,et al.  Expertise recommender: a flexible recommendation system and architecture , 2000, CSCW '00.

[32]  B. A. Gran,et al.  Use of Bayesian Belief Networks when combining disparate sources of information in the safety assessment of software-based systems , 2002, Int. J. Syst. Sci..

[33]  Thomas Fritz,et al.  Does a programmer's activity indicate knowledge of code? , 2007, ESEC-FSE '07.

[34]  Letha H. Etzkorn,et al.  Exploring the Relationship between Cohesion and Complexity , 2005 .

[35]  Giuliano Antoniol,et al.  Traceability recovery in RAD software systems , 2002, Proceedings 10th International Workshop on Program Comprehension.

[36]  D HerbslebJames,et al.  Two case studies of open source software development , 2002 .

[37]  Daniel M. Germán,et al.  An empirical study of fine-grained software modifications , 2004, 20th IEEE International Conference on Software Maintenance, 2004. Proceedings..

[38]  Marc Roper,et al.  A 3-Dimensional Relevance Model for Collaborative Software Engineering Spaces , 2007, International Conference on Global Software Engineering (ICGSE 2007).

[39]  Ilka Philippow,et al.  Enabling Automated Traceability Maintenance by Recognizing Development Activities Applied to Models , 2008, 2008 23rd IEEE/ACM International Conference on Automated Software Engineering.

[40]  Gustav Dahll,et al.  Combining disparate sources of information in the safety assessment of software-based systems , 2000 .

[41]  P H Bartels,et al.  Atypical adenomatous hyperplasia (adenosis) of the prostate: development of a Bayesian belief network for its distinction from well-differentiated adenocarcinoma. , 1996, Human pathology.