An empirical study of process-related attributes in segmented software cost-estimation relationships

Parametric software effort estimation models consisting on a single mathematical relationship suffer from poor adjustment and predictive characteristics in cases in which the historical database considered contains data coming from projects of a heterogeneous nature. The segmentation of the input domain according to clusters obtained from the database of historical projects serves as a tool for more realistic models that use several local estimation relationships. Nonetheless, it may be hypothesized that using clustering algorithms without previous consideration of the influence of well-known project attributes misses the opportunity to obtain more realistic segments. In this paper, we describe the results of an empirical study using the ISBSG-8 database and the EM clustering algorithm that studies the influence of the consideration of two process-related attributes as drivers of the clustering process: the use of engineering methodologies and the use of CASE tools. The results provide evidence that such consideration conditions significantly the final model obtained, even though the resulting predictive quality is of a similar magnitude.

[1]  Barry W. Boehm,et al.  Disaggregating and Calibrating the CASE Tool Variable in COCOMO II , 2002, IEEE Trans. Software Eng..

[2]  Barry W. Boehm,et al.  Software Engineering Economics , 1993, IEEE Transactions on Software Engineering.

[3]  H. E. Dunsmore,et al.  Software engineering metrics and models , 1986 .

[4]  Alain Abran,et al.  Exploring the Relation Between Effort and Duration in Software Engineering Projects , 2000 .

[5]  Ellis Horowitz,et al.  Software Cost Estimation with COCOMO II , 2000 .

[6]  Chung-Horng Lung,et al.  Applications of clustering techniques to software partitioning, recovery and restructuring , 2004, J. Syst. Softw..

[7]  Mark C. Paulk,et al.  Capability Maturity Model for Software , 2001 .

[8]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[9]  Witold Pedrycz,et al.  Genetic granular classifiers in modeling software quality , 2005, J. Syst. Softw..

[10]  BoehmBarry,et al.  Software development cost estimation approaches A survey , 2000 .

[11]  Barry W. Boehm,et al.  Cost models for future software life cycle processes: COCOMO 2.0 , 1995, Ann. Softw. Eng..

[12]  Abraham Kandel,et al.  Data mining in software metrics databases , 2004, Fuzzy Sets Syst..

[13]  José Javier Dolado,et al.  On the problem of the software cost function , 2001, Inf. Softw. Technol..

[14]  June M. Verner Function Point Analysis , 2002 .

[15]  Barbara A. Kitchenham,et al.  An empirical validation of the relationship between the magnitude of relative error and project size , 2002, Proceedings Eighth IEEE Symposium on Software Metrics.

[16]  Taghi M. Khoshgoftaar,et al.  Identification of fuzzy models of software cost estimation , 2004, Fuzzy Sets Syst..

[17]  Didier Maquin,et al.  Identification of fuzzy models , 1994 .