Selecting Open Source Projects for Traceability Case Studies

[Context & Motivation] Once research questions and initial theories have shaped, empirical research typically requires to select cases to study subsumed ideas. Issue trackers of todays open source systems (OSS) are a gold mine for empirical research, not least to study trace links among the included issue artifacts. [Question / problem] The huge amount of available OSS projects complicates the process of finding suitable cases to support the research goals. Further, simply picking a large number of projects on a random basis does not imply generalizability. Therefore the selection process should be carefully designed. [Principle ideas / results] In this paper we propose a method to choose OSS projects to study trace links found in issue tracking systems. Builds upon purposive sampling and cluster analysis, relevant project characteristics are identified whereas irrelevant information is filtered. Every step of the method is demonstrated on a live example. [Contributions] The proposed strategy selects an information-rich, representative and diverse sample of OSS to perform a traceability case study. Our work may be used as practical guide for other researchers to perform project selection tasks.

[1]  H. Suri Purposeful sampling in qualitative research synthesis , 2011 .

[2]  Matthew B. Miles,et al.  Qualitative Data Analysis: An Expanded Sourcebook , 1994 .

[3]  Richard N. Taylor,et al.  Software traceability with topic modeling , 2010, 2010 ACM/IEEE 32nd International Conference on Software Engineering.

[4]  Jane Huffman Hayes,et al.  Technique Integration for Requirements Assessment , 2007, 15th IEEE International Requirements Engineering Conference (RE 2007).

[5]  Christian Bird,et al.  Diversity in software engineering research , 2013, ESEC/FSE 2013.

[6]  Naihua Duan,et al.  Purposeful Sampling for Qualitative Data Collection and Analysis in Mixed Method Implementation Research , 2015, Administration and Policy in Mental Health and Mental Health Services Research.

[7]  Austen Rainer,et al.  Case Study Research in Software Engineering - Guidelines and Examples , 2012 .

[8]  Patrick Mäder,et al.  Towards feature-aware retrieval of refinement traces , 2013, 2013 7th International Workshop on Traceability in Emerging Forms of Software Engineering (TEFSE).

[9]  Patrick Mäder,et al.  The IlmSeven Dataset , 2017, 2017 IEEE 25th International Requirements Engineering Conference (RE).

[10]  Xavier Blanc,et al.  Computing contextual metric thresholds , 2014, SAC.

[11]  Giuliano Antoniol,et al.  Recovering Traceability Links between Code and Documentation , 2002, IEEE Trans. Software Eng..

[12]  B. Flyvbjerg Five Misunderstandings About Case-Study Research , 2006, 1304.1186.

[13]  Gregg G. Van Ryzin,et al.  Cluster Analysis as a Basis for Purposive Sampling of Projects in Case Study Evaluations , 1995 .

[14]  Peter Tryfos,et al.  Methods for Business Analysis and Forecasting: Text and Cases , 2005 .

[15]  Charles Teddlie,et al.  Mixed Methods Sampling A Typology With Examples , 2016 .

[16]  Patrick Mäder,et al.  Poster: Use of Trace Link Types in Issue Tracking Systems , 2018, 2018 IEEE/ACM 40th International Conference on Software Engineering: Companion (ICSE-Companion).

[17]  W. Gesler,et al.  Approaches to sampling and case selection in qualitative research: examples in the geography of health. , 2000, Social science & medicine.