A Curated Benchmark Collection of Python Systems for Empirical Studies on Software Engineering

The aim of this paper is to present a dataset of metrics associated to the first release of a curated collection of Python software systems. We describe the dataset along with the adopted criteria and the issues we faced while building such corpus. This dataset can enhance the reliability of empirical studies, enabling their reproducibility, reducing their cost, and it can foster further research on Python software.

[1]  Ian H. Witten,et al.  The New Zealand Digital Library Project , 1996, D Lib Mag..

[2]  Thomas J. Ostrand,et al.  \{PROMISE\} Repository of empirical software engineering data , 2007 .

[3]  Georgios Gousios,et al.  The GHTorent dataset and tool suite , 2013, 2013 10th Working Conference on Mining Software Repositories (MSR).

[4]  Johnny Wei-Bing Lin Why Python Is the Next Wave in Earth Sciences Computing , 2012 .

[5]  Ricardo Terra,et al.  Qualitas.class corpus: a compiled version of the qualitas corpus , 2013, SOEN.

[6]  Giulio Concas,et al.  Software Metrics in Agile Software: An Empirical Study , 2014, XP.

[7]  Giulio Concas,et al.  A study of the community structure of a complex software network , 2013, 2013 4th International Workshop on Emerging Trends in Software Metrics (WETSoM).

[8]  Witold Pedrycz,et al.  An Empirical Exploration of the Distributions of the Chidamber and Kemerer Object-Oriented Metrics Suite , 2004, Empirical Software Engineering.

[9]  Itay Maman,et al.  Micro patterns in Java code , 2005, OOPSLA '05.

[10]  Michael Stepp,et al.  An empirical study of Java bytecode programs , 2007, Softw. Pract. Exp..

[11]  P. Kaszubski Corpora in Applied Linguistics , 2003 .

[12]  Darío Correal,et al.  OpenHub: a scalable architecture for the analysis of software quality attributes , 2014, MSR 2014.

[13]  Amer Diwan,et al.  The DaCapo benchmarks: java benchmarking development and analysis , 2006, OOPSLA '06.

[14]  Michele Marchesi,et al.  Micro Pattern Fault-Proneness , 2012, 2012 38th Euromicro Conference on Software Engineering and Advanced Applications.

[15]  Gregg Rothermel,et al.  Supporting Controlled Experimentation with Testing Techniques: An Infrastructure and its Potential Impact , 2005, Empirical Software Engineering.

[16]  Jing Li,et al.  The Qualitas Corpus: A Curated Collection of Java Code for Empirical Studies , 2010, 2010 Asia Pacific Software Engineering Conference.

[17]  Carlo A. Furia,et al.  A Comparative Study of Programming Languages in Rosetta Code , 2014, 2015 IEEE/ACM 37th IEEE International Conference on Software Engineering.