Discretization of Rational Data

Frequently one wants to extend the use of a classification method that in principle requires records with True/False values, so that records with rational numbers can be processed. In such cases, the rational numbers must first be replaced by True/False values before the method may be applied. In other cases, a classification method in principle can process records with rational numbers directly, but replacement by True/False values improves the performance of the method. The replacement process is usually called discretization or binarization. This paper describes a recursive discretization process called Cutpoint. The key step of Cutpoint detects points where classification patterns change abruptly. The paper includes computational results where Cutpoint is compared with entropy-based methods, which to-date have been found to be the best discretization schemes. The results indicate that Cutpoint is preferred by certain classification schemes, while entropy-based methods are better for other classification methods. Thus, one may view Cutpoint to be an additional discretization tool that one may want to consider.

[1]  Luís Torgo,et al.  Dynamic Discretization of Continuous Attributes , 1998, IBERAMIA.

[2]  Toshihide Ibaraki,et al.  An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[3]  Klaus Truemper,et al.  Transformation of Rational Data and Set Data to Logic Data , 2006 .

[4]  Peter Auer,et al.  Theory and Applications of Agnostic PAC-Learning with Small Decision Trees , 1995, ICML.

[5]  Robert C. Holte,et al.  Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[6]  Ron Kohavi,et al.  Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[7]  Evangelos Triantaphyllou,et al.  Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques , 2009 .

[8]  Xindong Wu,et al.  A Bayesian Discretizer for Real-Valued Attributes , 1996, Comput. J..

[9]  Ke Wang,et al.  Minimum Splits Based Discretization for Continuous Features , 1997, IJCAI.

[10]  Petra Perner Image Mining for the Construction of Semantic-Inference Rules and for the Development of Automatic Image Diagnosis Systems , 2009 .

[11]  Klaus Truemper,et al.  Learning Logic Formulas and Related Error Distributions , 2006 .

[12]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[13]  Fabio Stella,et al.  Bayesian Belief Networks for Data Cleaning , 2008 .

[14]  Stephen D. Bay,et al.  Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[15]  Ron Kohavi,et al.  Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[16]  Xingquan Zhu,et al.  Knowledge Discovery and Data Mining: Challenges and Realities , 2007 .

[17]  Klaus Truemper,et al.  A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[18]  Stephen D. Bay Multivariate discretization of continuous variables for set mining , 2000, KDD '00.

[19]  Nick Cercone,et al.  Discretization of Continuous Attributes for Learning Classification Rules , 1999, PAKDD.

[20]  Renée J. Miller,et al.  Association rules over interval data , 1997, SIGMOD '97.

[21]  Jean Ponce,et al.  Computer Vision: A Modern Approach , 2002 .

[22]  Giovanni Felici,et al.  Mathematical Methods for Knowledge Discovery and Data Mining , 2007 .

[23]  Ramakrishnan Srikant,et al.  Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[24]  I‐Hsien Ting,et al.  Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development: Innovative Methods and Applications , 2010 .

[25]  Toshihide Ibaraki,et al.  Logical analysis of numerical data , 1997, Math. Program..

[26]  Marek Kretowski,et al.  An Evolutionary Algorithm Using Multivariate Discretization for Decision Rule Induction , 1999, PKDD.

[27]  Keki B. Irani,et al.  Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[28]  Wolfgang Maass,et al.  Efficient agnostic PAC-learning with simple hypothesis , 1994, COLT '94.

[29]  J. Ross Quinlan,et al.  Induction of Decision Trees , 1986, Machine Learning.