A Practical Way to Use Clustering and Context Knowledge for Software Project Planning

The use of empirical data from past projects for project planning is gaining increasing importance in engineering-style software development. Since software development projects are unique, experience from these projects cannot be reused directly. Nevertheless, common patterns can often be found when comparing past projects. This information can then be used to better support project planning. The article sketches the SPRINT I technique for project planning and controlling. The approach is grounded on the usage and analysis of context-oriented cluster curves. The article focuses on two aspects: How to identify similar projects and build so-called clusters with typical data curves (such as effort distribution); and how to characterize these clusters by using context knowledge. This allows for assigning a new project to a cluster in order to obtain a prediction. Results from an evaluation with data from 25 projects show that the technique provides a practical way to increase the accuracy of software project planning by using empirical data.

[1]  Paul F. Lazarsfeld,et al.  Latent Structure Analysis. , 1969 .

[2]  Isabella Wieczorek,et al.  Resource Estimation in Software Engineering , 2002 .

[3]  Robert R. Sokal,et al.  A statistical method for evaluating systematic relationships , 1958 .

[4]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[5]  J. Morgan,et al.  Problems in the Analysis of Survey Data, and a Proposal , 1963 .

[6]  L. A. Stone,et al.  Computer Aided Design of Experiments , 1969 .

[7]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[8]  Chris F. Kemerer,et al.  An empirical validation of software cost estimation models , 1987, CACM.

[9]  Karen Spärck Jones,et al.  Current approaches to classification and clump-finding at the Cambridge Language Research Unit , 1967, Comput. J..

[10]  F. Krauss Latent Structure Analysis , 1980 .

[11]  Ware Myers,et al.  Measures for Excellence: Reliable Software on Time, Within Budget , 1991 .

[12]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..

[13]  Brian Everitt,et al.  Cluster analysis , 1974 .

[14]  Victor R. Basili,et al.  A Pattern Recognition Approach for Software Engineering Data Analysis , 1992, IEEE Trans. Software Eng..

[15]  Marvin V. Zelkowitz,et al.  A model of noisy software engineering data (status report) , 1998, ICSE 1998.

[16]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[17]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[18]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[19]  Marvin V. Zelkowitz,et al.  An information model for use in software management estimation and prediction , 1993, CIKM '93.