A Modified Chi2 Algorithm Based on the Significance of Attribute

Discretization is one of the important components of the data preprocessing. Discretization can turn numeric attributes into discrete ones. There are many different kinds of discretization methods. This paper describes the Chi2 algorithm which is a simple and general discretization algorithm. In this algorithm, the chi2 statistic value is used as an evaluative standard to discretize the numeric attributes. However, the Chi2 algorithm dose not consider the sequence of discretization for each attribute in the second phase. And the inconsistency rate cannot fully reflect the characteristic of dataset. These drawbacks will affect the result of discretization finally. In this paper, some concepts of the rough set are introduced to improve the Chi2 algorithm

[1]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[2]  Francis Eng Hock Tay,et al.  A Modified Chi2 Algorithm for Discretization , 2002, IEEE Trans. Knowl. Data Eng..

[3]  D. J. Newman,et al.  UCI Repository of Machine Learning Database , 1998 .

[4]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[5]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..

[6]  Chao-Ton Su,et al.  An Extended Chi2 Algorithm for Discretization of Real Value Attributes , 2005, IEEE Trans. Knowl. Data Eng..

[7]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[8]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[9]  Christopher J. Merz,et al.  UCI Repository of Machine Learning Databases , 1996 .