Analyzing software effort estimation using k means clustered regression approach

Software estimation is an area where more assurances have been broken than in any other area of software development. Numerous studies attempting new and reliable software effort estimation techniques have been proposed but no consensus as to which techniques are the most appropriate has been reached so far. Due to the intangible nature of "software", effort estimation with a high level of accuracy remains a dream for developers. It is unlikely to expect very accurate estimates of development effort because of the inherent uncertainty in software projects and the complex and dynamic interaction of factors that impact software development. Heterogeneity exists in software engineering datasets because data is obtained from diverse sources. This can be reduced by defining certain relationships between the data values by classifying them into different clusters. This study focuses on how the combination of clustering and regression techniques can reduce the potential problem in effectiveness of predictive efficiency due to heterogeneity of the data. Using a clustered approach creates subsets of data having a degree of homogeneity that enhances prediction accuracy. It was also observed in this study that ridge regression performs better than other regression techniques. Another key finding is that by selecting a subset of highly predictive attributes using Grey relational analysis a significant improvement in prediction can be achieved.

[1]  John E. Gaffney,et al.  Software Function, Source Lines of Code, and Development Effort Prediction: A Software Science Validation , 1983, IEEE Transactions on Software Engineering.

[2]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Thomas A. Runkler,et al.  Alternating cluster estimation: a new tool for clustering and function approximation , 1999, IEEE Trans. Fuzzy Syst..

[5]  Chung-Chun Kung,et al.  Affine Takagi-Sugeno fuzzy modelling algorithm by fuzzy c-regression models clustering with a novel cluster validity criterion , 2007 .

[6]  Shiuh-Jer Huang,et al.  Control of an inverted pendulum using grey prediction model , 1994, Proceedings of 1994 IEEE Industry Applications Society Annual Meeting.

[7]  Jer-Min Jou,et al.  The gray prediction search algorithm for block motion estimation , 1999, IEEE Trans. Circuits Syst. Video Technol..

[8]  Michael M. Richter,et al.  A flexible method for software effort estimation by analogy , 2007, Empirical Software Engineering.

[9]  Szu-Lin Su,et al.  Grey-based power control for DS-CDMA cellular mobile systems , 2000, IEEE Trans. Veh. Technol..

[10]  Deng Ju-Long,et al.  Control problems of grey systems , 1982 .

[11]  J. Deng,et al.  Introduction to Grey system theory , 1989 .

[12]  Qinbao Song,et al.  Using grey relational analysis to predict software effort with small data sets , 2005, 11th IEEE International Software Metrics Symposium (METRICS'05).

[13]  Yi-Fan Wang,et al.  On-Demand Forecasting of Stock Prices Using a Real-Time Predictor , 2003, IEEE Trans. Knowl. Data Eng..

[14]  Ren C. Luo,et al.  Target tracking using a hierarchical grey-fuzzy motion decision-making method , 2001, IEEE Trans. Syst. Man Cybern. Part A.

[15]  Chunheng Wang,et al.  A clustering algorithm combine the FCM algorithm with supervised learning normal mixture model , 2008, 2008 19th International Conference on Pattern Recognition.

[16]  Bernard C. Jiang,et al.  Machine vision-based gray relational theory applied to IC marking inspection , 2002 .

[17]  J. Bezdek,et al.  FCM: The fuzzy c-means clustering algorithm , 1984 .

[18]  Peter I. Cowling,et al.  Fuzzy grey relational analysis for software effort estimation , 2010, Empirical Software Engineering.

[19]  Hui-Yin Tsai,et al.  Hierarchical clustering analysis based on Grey relation grade , 2005 .