A new approach for discretizing continuous attributes in learning systems

Discretization is a process to convert continuous attributes into discrete format to represent signals for further data processing in learning systems. The main concern in discretization techniques is to find an optimal representation of continuous values with limited number of intervals that can effectively characterize the data and meanwhile minimize information loss. In this paper, we propose a novel class-attribute interdependency discretization algorithm (termed as NCAIC), which takes account of data distribution and the interdependency between all classes and attributes. In our proposed solution, the upper approximation of rough sets as a prime part of the discretization algorithm is applied, and the class-attribute mutual information is used to automatically control and adjust the scope of the discretization of continuous attributes. Some experiments with comparison to five other discretization algorithms are reported, where 13 benchmarked datasets extracted from UCI database and the well-known C4.5 decision tree tool are employed in this study. Results demonstrate that in general our proposed algorithm outperforms other tested discretization algorithms in terms of classification performance.

[1]  Chao-Ton Su,et al.  An Extended Chi2 Algorithm for Discretization of Real Value Attributes , 2005, IEEE Trans. Knowl. Data Eng..

[2]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[3]  Janez Demsar,et al.  Statistical Comparisons of Classifiers over Multiple Data Sets , 2006, J. Mach. Learn. Res..

[4]  Yuxin Zhou,et al.  Treatment Method after Discretization of Continuous Attributes Based on Attributes Importance and Samples Entropy , 2011, 2011 Fourth International Conference on Intelligent Computation Technology and Automation.

[5]  Khurram Shehzad,et al.  EDISC: A Class-Tailored Discretization Technique for Rule-Based Classification , 2012, IEEE Transactions on Knowledge and Data Engineering.

[6]  Caihua Xiong,et al.  Improved Full-Discretization Method for Milling Chatter Stability Prediction with Multiple Delays , 2010, ICIRA.

[7]  Han Ding,et al.  A full-discretization method for prediction of milling stability , 2010 .

[8]  Francis Eng Hock Tay,et al.  A Modified Chi2 Algorithm for Discretization , 2002, IEEE Trans. Knowl. Data Eng..

[9]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[10]  Keqiu Li,et al.  A Local and Global Discretization Method , 2013 .

[11]  Wei-Pang Yang,et al.  A discretization algorithm based on Class-Attribute Contingency Coefficient , 2008, Inf. Sci..

[12]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[13]  Marc Boullé,et al.  A non-parametric semi-supervised discretization method , 2009, Knowledge and Information Systems.

[14]  Usama M. Fayyad,et al.  On the Handling of Continuous-Valued Attributes in Decision Tree Generation , 1992, Machine Learning.

[15]  Jerzy W. Grzymala-Busse,et al.  Rough Sets , 1995, Commun. ACM.

[16]  S. Kotsiantis,et al.  Discretization Techniques: A recent survey , 2006 .

[17]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[18]  Andrew K. C. Wong,et al.  Information Discovery through Hierarchical Maximum Entropy Discretization and Synthesis , 1991, Knowledge Discovery in Databases.

[19]  HerreraFrancisco,et al.  A Survey of Discretization Techniques , 2013 .

[20]  Andrew K. C. Wong,et al.  Class-Dependent Discretization for Inductive Learning from Continuous and Mixed-Mode Data , 1995, IEEE Trans. Pattern Anal. Mach. Intell..

[21]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[22]  Kemal Polat,et al.  Utilization of Discretization method on the diagnosis of optic nerve disease , 2008, Comput. Methods Programs Biomed..

[23]  Huaiqing Wang,et al.  A discretization algorithm based on a heterogeneity criterion , 2005, IEEE Transactions on Knowledge and Data Engineering.

[24]  Huan Liu,et al.  Feature Selection via Discretization , 1997, IEEE Trans. Knowl. Data Eng..