The IlmSeven Dataset

Developing new ideas and algorithms or comparing new findings in the field of requirements engineering and management implies a dataset to work with. Collecting the required data is time consuming, tedious, and may involve unforeseen difficulties. The need for datasets often forces re-searchers to collect data themselves in order to evaluate their findings. However, comparing results with other publications is especially difficult on proprietary datasets. A big obstacle is the reproduction of a previously used dataset, which may include subtle preprocessing steps not explicitly mentioned by the original authors. Providing a predefined dataset avoids these problems. It establishes a common baseline and enables direct comparison for benchmarking. This paper provides a well defined dataset consisting of seven open source software projects. It contains a large number of typed development artifacts and links between them. Enriched with additional metadata, such as time stamps, versions, and component information, the dataset allows answering a broad range of research questions.

[1]  Patrick Mäder,et al.  Estimating the Implementation Risk of Requirements in Agile Software Development Projects with Traceability Metrics , 2015, REFSQ.

[2]  Harald C. Gall,et al.  Populating a Release History Database from version control and bug tracking systems , 2003, International Conference on Software Maintenance, 2003. ICSM 2003. Proceedings..

[3]  Patrick Mäder,et al.  Software traceability: trends and future directions , 2014, FOSE.

[4]  Dar-Biau Liu,et al.  Metrics for requirements engineering , 1995, J. Syst. Softw..

[5]  Shari Lawrence Pfleeger,et al.  Software Quality: The Elusive Target , 1996, IEEE Softw..

[6]  Michael Jackson,et al.  A Reference Model for Requirements and Specifications , 2000, IEEE Softw..

[7]  George K. Thiruvathukal,et al.  Essential Tools: Version Control Systems , 2009, Computing in Science & Engineering.

[8]  Klaus Pohl,et al.  Requirements Engineering - Fundamentals, Principles, and Techniques , 2010 .

[9]  A.E. Hassan,et al.  The road ahead for Mining Software Repositories , 2008, 2008 Frontiers of Software Maintenance.

[10]  Björn Regnell,et al.  How Firms Adapt and Interact in Open Source Ecosystems: Analyzing Stakeholder Influence and Collaboration Patterns , 2016, REFSQ.

[11]  Patrick Mäder,et al.  Towards automated traceability maintenance , 2012, J. Syst. Softw..

[12]  Jane Cleland-Huang,et al.  A visual language for modeling and executing traceability queries , 2012, Software & Systems Modeling.

[13]  Abraham Bernstein,et al.  Software process data quality and characteristics: a historical view on open and closed source projects , 2009, IWPSE-Evol '09.

[14]  Harald C. Gall,et al.  Discovering Loners and Phantoms in Commit and Issue Data , 2015, 2015 IEEE 23rd International Conference on Program Comprehension.

[15]  Arvinder Kaur,et al.  Challenges in data extraction from Open Source software repositories , 2016, 2016 6th International Conference - Cloud System and Big Data Engineering (Confluence).

[16]  Patrick Mäder,et al.  Mind the gap: assessing the conformance of software traceability to relevant guidelines , 2014, ICSE.

[17]  Shawn A. Bohner,et al.  A framework for software maintenance metrics , 1990, Proceedings. Conference on Software Maintenance 1990.

[18]  Michele Marchesi,et al.  Are Bullies More Productive? Empirical Study of Affectiveness vs. Issue Fixing Time , 2015, 2015 IEEE/ACM 12th Working Conference on Mining Software Repositories.

[19]  Naoyasu Ubayashi,et al.  Magnet or sticky? an OSS project-by-project typology , 2014, MSR 2014.

[20]  Patrick Mäder,et al.  Preventing Defects: The Impact of Requirements Traceability Completeness on Software Quality , 2017, IEEE Transactions on Software Engineering.

[21]  Patrick Mäder,et al.  Are Graph Query Languages Applicable for Requirements Traceability Analysis? , 2017, REFSQ Workshops.

[22]  J. Dick Rich Traceability , 2002 .