A two-stage discretization algorithm based on information entropy

Discretization is an important and difficult preprocessing task for data mining and knowledge discovery. Although there are numerous discretization approaches, many suffer from certain drawbacks. Local approaches are efficient, but their generalization ability is weak. Global approaches consider all attributes simultaneously, but they have high time and space complexities. In this paper, we propose a two-stage discretization (TSD) algorithm based on information entropy. In the local discretization stage, we independently select k strong cuts for each attribute to minimize conditional entropy. The goal is to rapidly reduce the cardinality of the attributes, with minor information loss. In the global discretization stage, cuts for all attributes are considered simultaneously to form a scaled decision system. The minimal cut set that preserves the positive region is finally selected. We tested the new algorithm and seven popular algorithms on 28 datasets. Compared with other approaches, our algorithm has the best generalization ability, with a good information preserving ability, the highest classification accuracy, and reasonable time consumption.

[1]  Gabriella Balestra,et al.  ChiMerge discretization method: Impact on a computer aided diagnosis system for prostate cancer in MRI , 2015, 2015 IEEE International Symposium on Medical Measurements and Applications (MeMeA) Proceedings.

[2]  Wei-Pang Yang,et al.  A discretization algorithm based on Class-Attribute Contingency Coefficient , 2008, Inf. Sci..

[3]  Chun-An Chou,et al.  A Gaussian mixture model based discretization algorithm for associative classification of medical data , 2016, Expert Syst. Appl..

[4]  Jesús Alcalá-Fdez,et al.  KEEL Data-Mining Software Tool: Data Set Repository, Integration of Algorithms and Experimental Analysis Framework , 2011, J. Multiple Valued Log. Soft Comput..

[5]  Vladik Kreinovich,et al.  Handbook of Granular Computing , 2008 .

[6]  Lijuan Wang,et al.  Regularized Gaussian Mixture Model based discretization for gene expression data association mining , 2013, Applied Intelligence.

[7]  Witold Pedrycz,et al.  Granular computing: an introduction , 2001, Proceedings Joint 9th IFSA World Congress and 20th NAFIPS International Conference (Cat. No. 01TH8569).

[8]  Hugo Fuks,et al.  Qualitative activity recognition of weight lifting exercises , 2013, AH.

[9]  Yiyu Yao,et al.  Granular Computing: Past, Present, and Future , 2008, Rough Sets and Knowledge Technology.

[10]  Marina V. Fomina,et al.  Problem of knowledge discovery in noisy databases , 2011, Int. J. Mach. Learn. Cybern..

[11]  Hung Son Nguyen,et al.  Discretization Problem for Rough Sets Methods , 1998, Rough Sets and Current Trends in Computing.

[12]  Jinhai Li,et al.  Knowledge reduction in formal decision contexts based on an order-preserving mapping , 2012, Int. J. Gen. Syst..

[13]  Isabelle Guyon,et al.  An Introduction to Variable and Feature Selection , 2003, J. Mach. Learn. Res..

[14]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[15]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[16]  Fan Min,et al.  Rough sets approach to symbolic value partition , 2008, Int. J. Approx. Reason..

[17]  Huan Liu,et al.  Discretization: An Enabling Technique , 2002, Data Mining and Knowledge Discovery.

[18]  Fan Min,et al.  A hierarchical model for test-cost-sensitive decision systems , 2009, Inf. Sci..

[19]  Yiyu Yao,et al.  A Partition Model of Granular Computing , 2004, Trans. Rough Sets.

[20]  Zhenmin Tang,et al.  Minimum cost attribute reduction in decision-theoretic rough set models , 2013, Inf. Sci..

[21]  Ingoo Han,et al.  Genetic algorithms approach to feature discretization in artificial neural networks for the prediction of stock price index , 2000 .

[22]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[23]  Francisco Herrera,et al.  A Survey of Discretization Techniques: Taxonomy and Empirical Analysis in Supervised Learning , 2013, IEEE Transactions on Knowledge and Data Engineering.

[24]  Lijun Xie,et al.  A Divide-and-Conquer Discretization Algorithm , 2005, FSKD.

[25]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[26]  Wang Guo,et al.  Decision Table Reduction based on Conditional Information Entropy , 2002 .

[27]  Andrzej Skowron,et al.  Rough sets and current trends in computing : first international conference, RSCTC '98, Warsaw, Poland, June 22-26, 1998 : proceedings , 1998 .

[28]  Yee Leung,et al.  Theory and applications of granular labelled partitions in multi-scale decision tables , 2011, Inf. Sci..

[29]  Dun Liu,et al.  A rough set-based incremental approach for learning knowledge in dynamic incomplete information systems , 2014, Int. J. Approx. Reason..

[30]  Yuhua Qian,et al.  Test-cost-sensitive attribute reduction , 2011, Inf. Sci..

[31]  Tsau Young Lin,et al.  Granular computing: structures, representations, and applications , 2003 .

[32]  Zhifei Zhang,et al.  International Journal of Approximate Reasoning Diverse Reduct Subspaces Based Co-training for Partially Labeled Data , 2022 .

[33]  Philippe Lagacherie,et al.  Geo-MHYDAS: A landscape discretization tool for distributed hydrological modeling of cultivated areas , 2010, Comput. Geosci..

[34]  Azuraliza Abu Bakar,et al.  Building a new taxonomy for data discretization techniques , 2009, 2009 2nd Conference on Data Mining and Optimization.

[35]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[36]  Fan Min,et al.  Dynamic Discretization: A Combination Approach , 2007, 2007 International Conference on Machine Learning and Cybernetics.

[37]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[38]  Arie Ben-David,et al.  About the relationship between ROC curves and Cohen's kappa , 2008, Eng. Appl. Artif. Intell..

[39]  F. Wilcoxon Individual Comparisons by Ranking Methods , 1945 .

[40]  Sinh Hoa Nguyen,et al.  On Finding Optimal Discretizations for Two Attributes , 1998, Rough Sets and Current Trends in Computing.

[41]  Huan Liu,et al.  Chi2: feature selection and discretization of numeric attributes , 1995, Proceedings of 7th IEEE International Conference on Tools with Artificial Intelligence.

[42]  Hugo Fuks,et al.  Wearable Computing: Accelerometers' Data Classification of Body Postures and Movements , 2012, SBIA.

[43]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[44]  Jiawei Han,et al.  Data Mining: Concepts and Techniques , 2000 .

[45]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[46]  Carlos Soares,et al.  Entropy-based discretization methods for ranking data , 2016, Inf. Sci..

[47]  Witold Pedrycz,et al.  Positive approximation: An accelerator for attribute reduction in rough set theory , 2010, Artif. Intell..

[48]  Feng Jiang,et al.  A novel approach for discretization of continuous attributes in rough set theory , 2015, Knowl. Based Syst..

[49]  Kemal Polat,et al.  Utilization of Discretization method on the diagnosis of optic nerve disease , 2008, Comput. Methods Programs Biomed..

[50]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[51]  Hailin Li On-line and dynamic time warping for time series data mining , 2015, Int. J. Mach. Learn. Cybern..

[52]  Jason Catlett,et al.  On Changing Continuous Attributes into Ordered Discrete Attributes , 1991, EWSL.

[53]  Witold Pedrycz,et al.  Measuring relevance between discrete and continuous features based on neighborhood mutual information , 2011, Expert Syst. Appl..

[55]  Xu Weihua,et al.  Knowledge granulation, knowledge entropy and knowledge uncertainty measure in ordered information systems , 2009 .

[56]  Krzysztof J. Cios,et al.  ur-CAIM: improved CAIM discretization for unbalanced and balanced data , 2016, Soft Comput..

[57]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[58]  Francisco Herrera,et al.  Data Preprocessing in Data Mining , 2014, Intelligent Systems Reference Library.

[59]  Usama M. Fayyad,et al.  Multi-Interval Discretization of Continuous-Valued Attributes for Classification Learning , 1993, IJCAI.

[60]  Qinghua Hu,et al.  Hybrid attribute reduction based on a novel fuzzy-rough model and information granulation , 2007, Pattern Recognit..

[61]  Catherine Blake,et al.  UCI Repository of machine learning databases , 1998 .

[62]  Witold Pedrycz,et al.  Selecting Discrete and Continuous Features Based on Neighborhood Decision Error Minimization , 2010, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).