An Approach to Outlier Detection of Software Measurement Data using the K-means Clustering Method

The quality of software measurement data affects the accuracy of project manager's decision making using estimation or prediction models and the understanding of real project status. During the software measurement implementation, the outlier which reduces the data quality is collected, however its detection is not easy. To cope with this problem, we propose an approach to outlier detection of software measurement data using the k-means clustering method in this work.

[1]  Alberto Sillitti,et al.  Collecting, integrating and analyzing software metrics and personal software process data , 2003, 2003 Proceedings 29th Euromicro Conference.

[2]  Martin J. Shepperd,et al.  Estimating Software Project Effort Using Analogies , 1997, IEEE Trans. Software Eng..

[3]  Ingunn Myrtveit,et al.  Reliability and validity in comparative studies of software prediction models , 2005, IEEE Transactions on Software Engineering.

[4]  Warren S. Sarle,et al.  Cubic Clustering Criterion , 1983 .

[5]  Victoria J. Hodge,et al.  A Survey of Outlier Detection Methodologies , 2004, Artificial Intelligence Review.

[6]  Ingunn Myrtveit,et al.  Analyzing Data Sets with Missing Data: An Empirical Evaluation of Imputation Methods and Likelihood-Based Methods , 2001, IEEE Trans. Software Eng..

[7]  Stephen G. MacDonell,et al.  Using prior-phase effort records for re-estimation during software projects , 2003, Proceedings. 5th International Workshop on Enterprise Networking and Computing in Healthcare Industry (IEEE Cat. No.03EX717).

[8]  Keith Phalp,et al.  An investigation of machine learning based prediction systems , 2000, J. Syst. Softw..

[9]  Douglas Fisher,et al.  Machine Learning Approaches to Estimating Software Development Effort , 1995, IEEE Trans. Software Eng..

[10]  Barry W. Boehm,et al.  Software development cost estimation approaches — A survey , 2000, Ann. Softw. Eng..

[11]  Taghi M. Khoshgoftaar,et al.  Detecting Noisy Instances with the Ensemble Filter: a Study in Software Quality Estimation , 2006, Int. J. Softw. Eng. Knowl. Eng..

[12]  Paolo Giudici,et al.  Applied Data Mining: Statistical Methods for Business and Industry , 2003 .

[13]  Amrit L. Goel,et al.  Empirical Data Modeling in Software Engineering Using Radical Basis Functions , 2000, IEEE Trans. Software Eng..

[14]  Parag C. Pendharkar,et al.  A probabilistic model for predicting software development effort , 2003, IEEE Transactions on Software Engineering.

[15]  Lawrence H. Putnam,et al.  A General Empirical Solution to the Macro Software Sizing and Estimating Problem , 1978, IEEE Transactions on Software Engineering.

[16]  Barry W. Boehm,et al.  Cost models for future software life cycle processes: COCOMO 2.0 , 1995, Ann. Softw. Eng..

[17]  Siba N. Mohanty,et al.  Software cost estimation: Present and future , 1981, Softw. Pract. Exp..

[18]  Mohammad Alshayeb,et al.  An Empirical Validation of Object-Oriented Metrics in Two Different Iterative Software Processes , 2003, IEEE Trans. Software Eng..

[19]  Mauri Laitinen,et al.  Thinking objectively: software engineering in the small , 2000, CACM.

[20]  Barbara A. Kitchenham,et al.  A Simulation Study of the Model Evaluation Criterion MMRE , 2003, IEEE Trans. Software Eng..

[21]  Jason Crampton,et al.  The interpretation and utility of three cohesion metrics for object-oriented design , 2006, TSEM.

[22]  Kenji Yokoyama,et al.  Development of a hybrid cost estimation model in an iterative manner , 2006, ICSE.

[23]  Y. Miyazaki,et al.  Robust regression for developing software estimation models , 1994, J. Syst. Softw..