A discretization method for rough sets theory

The Rough Sets Theory, as a powerful knowledge-mining tool, has been widely applied to acquire knowledge in the medical, engineering and financial domains. However, this powerful tool cannot be applied to real-world classification tasks involving continuous features. This requires the utilization of discretization methods. ChiMerge, since it was first proposed in 1992, has become a widely used discretization method. The Chi2 algorithm is one modification to the ChiMerge algorithm. It automates the discretization process by introducing an inconsistency rate as the stopping criterion and it automatically selects the significance level. In addition, it incorporates a finer phase aimed at feature selection to broaden the applications of the ChiMerge algorithm. However, both the ChiMerge and the Chi2 algorithms do not consider the inaccuracy inherent in the merging criterion. In addition, the user-defined inconsistency rate of the Chi2 algorithm also brings about inaccuracy to the discretization process which leads to over-merging. To overcome these two drawbacks, a new discretization method, termed as the modified Chi2 algorithm, is proposed. Comparison studies carried out on the predictive accuracy shows that this modified Chi2 algorithm outperforms the original Chi2 algorithm. Thus, a completely automatic discretization method for Rough Sets Theory has been realized.

[1]  Casimir A. Kulikowski,et al.  Computer Systems That Learn: Classification and Prediction Methods from Statistics, Neural Nets, Machine Learning and Expert Systems , 1990 .

[2]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[3]  R. Słowiński Intelligent Decision Support: Handbook of Applications and Advances of the Rough Sets Theory , 1992 .

[4]  W. Ziarko,et al.  An application of DATALOGIC/R knowledge discovery tool to identify strong predictive rules in stock market data , 1993 .

[5]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[6]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[7]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[8]  Roman Słowiński,et al.  Evaluation of vibroacoustic diagnostic symptoms by means of the rough sets theory , 1992 .

[9]  Krzysztof Slowinski,et al.  Rough Classification of HSV Patients , 1992, Intelligent Decision Support.

[10]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[11]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .

[12]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[13]  Douglas C. Montgomery,et al.  Applied Statistics and Probability for Engineers, Third edition , 1994 .

[14]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.

[15]  Hiroshi Tanaka,et al.  Automated Discovery of Medical Expert System Rules from Clinical Databases Based on Rough Sets , 1996, KDD.

[16]  Liangsheng Qu,et al.  Fault diagnosis using Rough Sets Theory , 2000 .

[17]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.