The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies

In order to increase our ability to use measurement to support software development practise we need to do more analysis of code. However, empirical studies of code are expensive and their results are difficult to compare. We describe the Qualitas Corpus, a large curated collection of open source Java systems. The corpus reduces the cost of performing large empirical studies of code and supports comparison of measurements of the same artifacts. We discuss its design, organisation, and issues associated with its development.

[1]  Donald E. Knuth,et al.  An empirical study of FORTRAN programs , 1971, Softw. Pract. Exp..

[2]  Jens Palsberg,et al.  Encapsulating objects with confined types , 2001, OOPSLA 2001.

[3]  Susan Hunston,et al.  Corpora in Applied Linguistics , 2002 .

[4]  R. J. Chevance,et al.  Static profile and dynamic behavior of COBOL programs , 1978, SIGP.

[5]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[6]  Steve Counsell,et al.  Power law distributions in class relationships , 2003, Proceedings Third IEEE International Workshop on Source Code Analysis and Manipulation.

[7]  B. M. Barry Prototyping a real-time embedded system in Smalltalk , 1989, OOPSLA 1989.

[8]  Chris F. Kemerer,et al.  A Metrics Suite for Object Oriented Design , 2015, IEEE Trans. Software Eng..

[9]  Itay Maman,et al.  Micro patterns in Java code , 2005, OOPSLA '05.

[10]  Michael Stepp,et al.  An empirical study of Java bytecode programs , 2007, Softw. Pract. Exp..

[11]  Witold Pedrycz,et al.  An Empirical Exploration of the Distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite , 2004, Empirical Software Engineering.

[12]  Ian H. Witten,et al.  The New Zealand Digital Library Project , 1996, D Lib Mag..

[13]  James Noble,et al.  Scale-free geometry in OO programs , 2005, CACM.

[14]  Lionel C. Briand,et al.  Exploring the relationships between design measures and software quality in object-oriented systems , 2000, J. Syst. Softw..

[15]  Ewan D. Tempero,et al.  The CRSS Metric for Package Design Quality , 2007, ACSC.

[16]  Rachel Harrison,et al.  Coupling metrics for object-oriented design , 1998, Proceedings Fifth International Software Metrics Symposium. Metrics (Cat. No.98TB100262).

[17]  James M. Bieman,et al.  Reuse through inheritance: a quantitative study of C++ software , 1995, SSR '95.

[18]  Barton P. Miller,et al.  An empirical study of the reliability of UNIX utilities , 1990, Commun. ACM.

[19]  David P. Darcy,et al.  Managerial Use of Metrics for Object-Oriented Software: An Exploratory Analysis , 1998, IEEE Trans. Software Eng..

[20]  William B. Frakes,et al.  An Empirical Study of Representation Methods for Reusable Software Components , 1994, IEEE Trans. Software Eng..

[21]  Sallie M. Henry,et al.  An empirical study of the object-oriented paradigm and software reuse , 1991, OOPSLA '91.

[22]  Sushil Krishna Bajracharya,et al.  Sourcerer: a search engine for open source code supporting structure-based search , 2006, OOPSLA '06.

[23]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.