Data mining in software metrics databases

We investigate the use of data mining for the analysis of software metric databases, and some of the issues in this application domain. Software metrics are collected at various phases of the software development process, in order to monitor and control the quality of a software product. However, software quality control is complicated by the complex relationship between these metrics and the attributes of a software development process. Data mining has been proposed as a potential technology for supporting and enhancing our understanding of software metrics and their relationship to software quality. In this paper, we use fuzzy clustering to investigate three datasets of software metrics, along with the larger issue of whether supervised or unsupervised learning is more appropriate for software engineering problems. While our findings generally confirm the known linear relationship between metrics and change rates, some interesting behaviors are noted. In addition, our results partly contradict earlier studies that only used correlation analysis to investigate these datasets. These results illustrate how intelligent technologies can augment traditional statistical inference in software quality control.

[1]  Gerardo Beni,et al.  A Validity Measure for Fuzzy Clustering , 1991, IEEE Trans. Pattern Anal. Mach. Intell..

[2]  Anas N. Al-Rabadi,et al.  A comparison of modified reconstructability analysis and Ashenhurst‐Curtis decomposition of Boolean functions , 2004 .

[3]  James C. Bezdek,et al.  Pattern Recognition with Fuzzy Objective Function Algorithms , 1981, Advanced Applications in Pattern Recognition.

[4]  Taghi M. Khoshgoftaar,et al.  Classification-tree models of software-quality over multiple releases , 2000, IEEE Trans. Reliab..

[5]  K. Vairavan,et al.  An Experimental Study of Software Metrics for Real-Time Software , 1985, IEEE Transactions on Software Engineering.

[6]  V. J. Rayward-Smith,et al.  Fuzzy Cluster Analysis: Methods for Classification, Data Analysis and Image Recognition , 1999 .

[7]  M. P. Windham Cluster validity for fuzzy clustering algorithms , 1981 .

[8]  Ekkehard Baisch,et al.  Comparison of conventional approaches and soft-computing approaches for software quality prediction , 1999 .

[9]  James F. Peters,et al.  Approximate Time Rough Software Cost Decision System: multicriteria Decision-Making Approach , 1999, ISMIS.

[10]  John C. Munson,et al.  Software metrics in reliability assessment , 1996 .

[11]  Michael Friedman,et al.  Software Assessment: Reliability, Safety, Testability , 1995 .

[12]  Victor R. Basili,et al.  Validation on an Approach for Improving Existing Measurement Frameworks , 2000, IEEE Trans. Software Eng..

[13]  U. M. Feyyad Data mining and knowledge discovery: making sense out of data , 1996 .

[14]  Edward B. Allen,et al.  An application of genetic programming to software quality prediction , 1998 .

[15]  Imran Bashir,et al.  Testing object-oriented software - life cycle solutions , 1999 .

[16]  Tosiyasu L. Kunii,et al.  Software Metrics Knowledge and Databases for Project Management , 1999, IEEE Trans. Knowl. Data Eng..

[17]  Taghi M. Khoshgoftaar,et al.  Data Mining for Predictors of Software Quality , 1999, Int. J. Softw. Eng. Knowl. Eng..

[18]  Witold Pedrycz,et al.  Computational intelligence in software engineering , 1997, CCECE '97. Canadian Conference on Electrical and Computer Engineering. Engineering Innovation: Voyage of Discovery. Conference Proceedings.

[19]  Witold Pedrycz,et al.  Software Engineering: An Engineering Approach , 1999 .

[20]  Maurice H. Halstead,et al.  Elements of software science , 1977 .

[21]  W. Peizhuang Pattern Recognition with Fuzzy Objective Function Algorithms (James C. Bezdek) , 1983 .

[22]  Mark C. Paulk,et al.  Capability Maturity Model , 1991 .

[23]  Patrick Brézillon,et al.  Lecture Notes in Artificial Intelligence , 1999 .

[24]  Christof Ebert,et al.  Fuzzy classification for software criticality analysis , 1996 .

[25]  Taghi M. Khoshgoftaar,et al.  An application of fuzzy clustering to software quality prediction , 2000, Proceedings 3rd IEEE Symposium on Application-Specific Systems and Software Engineering Technology.

[26]  Amrit L. Goel,et al.  Knowledge discovery and validation in software metrics databases , 1999, Defense, Security, and Sensing.

[27]  James M. Keller,et al.  A fuzzy K-nearest neighbor algorithm , 1985, IEEE Transactions on Systems, Man, and Cybernetics.

[28]  Andrew R. Gray,et al.  A simulation-based comparison of empirical modeling techniques for software metric models of development effort , 1999, ICONIP'99. ANZIIS'99 & ANNES'99 & ACNN'99. 6th International Conference on Neural Information Processing. Proceedings (Cat. No.99EX378).

[29]  R. Belschner,et al.  A neural fuzzy system to evaluate software development productivity , 1995, 1995 IEEE International Conference on Systems, Man and Cybernetics. Intelligent Systems for the 21st Century.

[30]  Christof Ebert,et al.  KNOWLEDGE-BASED TECHNIQUES FOR SOFTWARE QUALITY MANAGEMENT , 1998 .

[31]  David G. Stork,et al.  Pattern Classification , 1973 .

[32]  Nikolaos G. Bourbakis,et al.  A neuro-expert system for the prediction of software metrics , 1996 .

[33]  Mauricio Amaral de Almeida,et al.  An Investigation on the Use of Machine Learned Models for Estimating Software Correctability , 1999, Int. J. Softw. Eng. Knowl. Eng..

[34]  Stephen G. MacDonell,et al.  Applications of fuzzy logic to software metric models for development effort estimation , 1997, 1997 Annual Meeting of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.97TH8297).

[35]  K. Vairavan,et al.  An Experimental Investigation of Software Metrics and Their Relationship to Software Development Effort , 1989, IEEE Trans. Software Eng..

[36]  J. C. Dunn,et al.  A Fuzzy Relative of the ISODATA Process and Its Use in Detecting Compact Well-Separated Clusters , 1973 .

[37]  Taghi M. Khoshgoftaar,et al.  Using neural networks to predict software faults during testing , 1996, IEEE Trans. Reliab..

[38]  Zongming Fei,et al.  Experience Using Web-Based Shotgun Measures for Large-System Characterization and Improvement , 1998, IEEE Trans. Software Eng..

[39]  Victor R. Basili,et al.  An Approach to Improving Existing Measurement Frameworks , 1998, IBM Syst. J..

[40]  Taghi M. Khoshgoftaar,et al.  NEURAL NETWORKS FOR SOFTWARE QUALITY PREDICTION , 1998 .

[41]  W. Pedrycz,et al.  Granular correlation analysis in data mining , 1999, 18th International Conference of the North American Fuzzy Information Processing Society - NAFIPS (Cat. No.99TH8397).

[42]  Yashwant K. Malaiya,et al.  Neural networks for software reliability engineering , 1996 .