An application of a rule-based model in software quality classification

A new rule-based classification model (RBCM) and rule-based model selection technique are presented. The RBCM utilizes rough set theory to significantly reduce the number of attributes, discretation to partition the domain of attribute values, and Boolean predicates to generate the decision rules that comprise the model. When the domain values of an attribute are continuous and relatively large, rough set theory requires that they be discretized. The subsequent discretized domain must have the same characteristics as the original domain values. However, this can lead to a large number of partitions of the attribute's domain space, which in turn leads to large rule sets. These rule sets tend to form models that over-fit. To address this issue, the proposed rule-based model adopts a new model selection strategy that minimizes over-fitting for the RBCM. Empirical validation of the RBCM is accomplished through a case study on a large legacy telecommunications system. The results demonstrate that the proposed RBCM and the model selection strategy are effective in identifying the classification model that minimizes over-fitting and high cost classification errors.

[1]  Taghi M. Khoshgoftaar,et al.  Application of fuzzy expert systems in assessing operational risk of software , 2003, Inf. Softw. Technol..

[2]  Andrzej Skowron,et al.  EXTRACTING LAWS FROM DECISION TABLES: A ROUGH SET APPROACH , 1995, Comput. Intell..

[3]  Victor R. Basili,et al.  A Validation of Object-Oriented Design Metrics as Quality Indicators , 1996, IEEE Trans. Software Eng..

[4]  Keith Phalp,et al.  An investigation of machine learning based prediction systems , 2000, J. Syst. Softw..

[5]  Taghi M. Khoshgoftaar,et al.  The Detection of Fault-Prone Programs , 1992, IEEE Trans. Software Eng..

[6]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[7]  Taghi M. Khoshgoftaar,et al.  Using regression trees to classify fault-prone software modules , 2002, IEEE Trans. Reliab..

[8]  Taghi M. Khoshgoftaar,et al.  LOGISTIC REGRESSION MODELING OF SOFTWARE QUALITY , 1999 .

[9]  Christof Ebert,et al.  Classification techniques for metric-based software development , 1996, Software Quality Journal.

[10]  Andrzej Skowron,et al.  Rough Sets: A Tutorial , 1998 .

[11]  Taghi M. Khoshgoftaar,et al.  Analogy-Based Practical Classification Rules for Software Quality Estimation , 2003, Empirical Software Engineering.

[12]  Wojciech Ziarko,et al.  DATA‐BASED ACQUISITION AND INCREMENTAL MODIFICATION OF CLASSIFICATION RULES , 1995, Comput. Intell..

[13]  Taghi M. Khoshgoftaar,et al.  Analyzing software measurement data with clustering techniques , 2004, IEEE Intelligent Systems.

[14]  Andrzej Skowron,et al.  Discovery of Data Patterns with Applications to Decomposition and Classification Problems , 1998 .

[15]  Jakub Wroblewski,et al.  Covering with Reducts - A Fast Algorithm for Rule Generation , 1998, Rough Sets and Current Trends in Computing.

[16]  Marcin S. Szczuka,et al.  A New Version of Rough Set Exploration System , 2002, Rough Sets and Current Trends in Computing.

[17]  Jean-Marc Desharnais,et al.  A comparison of software effort estimation techniques: Using function points with neural networks, case-based reasoning and regression models , 1997, J. Syst. Softw..