Software Diagnosis Using Fuzzified Attribute Base on Modified MEPA

Currently, there are many data preprocess methods, such as data discretization, data cleaning, data integration and transformation, data reduction ... etc. Concept hierarchies are a form of data discretization that can use for data preprocessing. Using discrete data are usually more compact, shorter and more quickly than using continuous ones. So that we proposed a data discretization method, which is the modified minimize entropy principle approach to fuzzify attribute and then build the classification tree. For verification, two NASA software projects KC2 and JM1 are applied to illustrate our proposed method. We establish a prototype system to discrete data from these projects. The error rate and number of rules show that the proposed approaches are both better than other methods.

[1]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[2]  Taghi M. Khoshgoftaar,et al.  Detecting noisy instances with the rule-based classification model , 2005, Intell. Data Anal..

[3]  T. Ross Fuzzy Logic with Engineering Applications , 1994 .

[4]  Ian Witten,et al.  Data Mining , 2000 .

[5]  Sankar K. Pal,et al.  Data mining in soft computing framework: a survey , 2002, IEEE Trans. Neural Networks.

[6]  Tim Menzies,et al.  The \{PROMISE\} Repository of Software Engineering Databases. , 2005 .

[7]  Joseph M. Hellerstein,et al.  Potter's Wheel: An Interactive Data Cleaning System , 2001, VLDB.

[8]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.

[9]  Margaret H. Dunham,et al.  Data Mining: Introductory and Advanced Topics , 2002 .

[10]  Jiawei Han,et al.  Knowledge Discovery in Databases: An Attribute-Oriented Approach , 1992, VLDB.

[11]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[12]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[13]  Shari Lawrence Pfleeger,et al.  Software metrics (2nd ed.): a rigorous and practical approach , 1997 .

[14]  Gregory Piatetsky-Shapiro,et al.  The KDD process for extracting useful knowledge from volumes of data , 1996, CACM.

[15]  Ian H. Witten,et al.  Data mining: practical machine learning tools and techniques, 3rd Edition , 1999 .

[16]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[17]  Ronald R. Yager,et al.  Template-Based Fuzzy Systems Modeling , 1994, J. Intell. Fuzzy Syst..

[18]  Shari Lawrence Pfleeger,et al.  Software Metrics : A Rigorous and Practical Approach , 1998 .