Initial evaluation of data quality in a TSP software engineering project data repository

To meet critical business challenges, software development teams need data to effectively manage product quality, cost, and schedule. The Team Software ProcessSM (TSPSM) provides a framework that teams use to collect software process data in real time, using a defined disciplined process. This data holds promise for use in software engineering research. We combined data from 109 industrial projects into a database to support performance benchmarking and model development. But is the data of sufficient quality to draw conclusions? We applied various tests and techniques to identify data anomalies that affect the quality of the data in several dimensions. In this paper, we report some initial results of our analysis, describing the amount and the rates of identified anomalies and suspect data, including incorrectness, inconsistency, and credibility. To illustrate the types of data available for analysis, we provide three examples. The preliminary results of this empirical study suggest that some aspects of the data quality are good and the data are generally credible, but size data are often missing.

[1]  M. Bosu Data Quality Challenges in Empirical Software Engineering : An Evidence-Based Solution , 2013 .

[2]  Philip M. Johnson,et al.  Investigating data quality problems in the PSP , 1998, SIGSOFT '98/FSE-6.

[3]  Martin J. Shepperd,et al.  Data quality: cinderella at the software metrics ball? , 2011, WETSoM '11.

[4]  R. Ramakumar Engineering Reliability: Fundamentals and Applications , 1996 .

[5]  William R. Nichols,et al.  TSP Performance and Capability Evaluation (PACE): Customer Guide , 2013 .

[6]  Philip M. Johnson,et al.  A Critical Analysis of PSP Data Quality: Results from a Case Study , 1999, Empirical Software Engineering.

[7]  David J. Hand,et al.  How to lie with bad data , 2005 .

[8]  Noopur Davis,et al.  The Team Software ProcessSM (TSPSM) in Practice: A Summary of Recent Results , 2003 .

[9]  David A. Carrington,et al.  Using Measurement Data in a TSPSM Project , 2004, EuroSPI.

[10]  Swapna S. Gokhale,et al.  A multiplicative model of software defect repair times , 2010, Empirical Software Engineering.

[11]  William R. Nichols,et al.  Using TSP Data to Evaluate Your Project Performance , 2010 .

[12]  Watts S. Humphrey,et al.  Using A Defined and Measured Personal Software Process , 1996, IEEE Softw..

[13]  Hongyu Zhang On the Distribution of Software Faults , 2008, IEEE Transactions on Software Engineering.

[14]  Jorawar Singh,et al.  TSP (Team Software Process) , 2013 .

[15]  Gernot Armin Liebchen,et al.  Data cleaning techniques for software engineering data sets , 2010 .

[16]  Martin Shepperd,et al.  Data Sets and Data Quality in Software Engineering: Eight Years On , 2016, PROMISE.

[17]  Watts S. Humphrey,et al.  Introduction to the Team Software Process , 1999 .

[18]  Stephen G. MacDonell,et al.  Data quality in empirical software engineering: a targeted review , 2013, EASE '13.