论文信息 - Discretization of Rational Data

Discretization of Rational Data

Frequently one wants to extend the use of a classification method that in principle requires records with True/False values, so that records with rational numbers can be processed. In such cases, the rational numbers must first be replaced by True/False values before the method may be applied. In other cases, a classification method in principle can process records with rational numbers directly, but replacement by True/False values improves the performance of the method. The replacement process is usually called discretization or binarization. This paper describes a recursive discretization process called Cutpoint. The key step of Cutpoint detects points where classification patterns change abruptly. The paper includes computational results where Cutpoint is compared with entropy-based methods, which to-date have been found to be the best discretization schemes. The results indicate that Cutpoint is preferred by certain classification schemes, while entropy-based methods are better for other classification methods. Thus, one may view Cutpoint to be an additional discretization tool that one may want to consider.

Klaus Truemper | Jonathan Mugan

[1] Luís Torgo,et al. Dynamic Discretization of Continuous Attributes , 1998, IBERAMIA.

[2] Toshihide Ibaraki,et al. An Implementation of Logical Analysis of Data , 2000, IEEE Trans. Knowl. Data Eng..

[3] Klaus Truemper,et al. Transformation of Rational Data and Set Data to Logic Data , 2006 .

[4] Peter Auer,et al. Theory and Applications of Agnostic PAC-Learning with Small Decision Trees , 1995, ICML.

[5] Robert C. Holte,et al. Very Simple Classification Rules Perform Well on Most Commonly Used Datasets , 1993, Machine Learning.

[6] Ron Kohavi,et al. Supervised and Unsupervised Discretization of Continuous Features , 1995, ICML.

[7] Evangelos Triantaphyllou,et al. Data Mining and Knowledge Discovery Approaches Based on Rule Induction Techniques , 2009 .

[8] Xindong Wu,et al. A Bayesian Discretizer for Real-Valued Attributes , 1996, Comput. J..

[9] Ke Wang,et al. Minimum Splits Based Discretization for Continuous Features , 1997, IJCAI.

[10] Petra Perner. Image Mining for the Construction of Semantic-Inference Rules and for the Development of Automatic Image Diagnosis Systems , 2009 .

[11] Klaus Truemper,et al. Learning Logic Formulas and Related Error Distributions , 2006 .

[12] Thomas G. Dietterich. What is machine learning? , 2020, Archives of Disease in Childhood.

[13] Fabio Stella,et al. Bayesian Belief Networks for Data Cleaning , 2008 .

[14] Stephen D. Bay,et al. Detecting change in categorical data: mining contrast sets , 1999, KDD '99.

[15] Ron Kohavi,et al. Error-Based and Entropy-Based Discretization of Continuous Features , 1996, KDD.

[16] Xingquan Zhu,et al. Knowledge Discovery and Data Mining: Challenges and Realities , 2007 .

[17] Klaus Truemper,et al. A MINSAT Approach for Learning in Logic Domains , 2002, INFORMS J. Comput..

[18] Stephen D. Bay. Multivariate discretization of continuous variables for set mining , 2000, KDD '00.

[19] Nick Cercone,et al. Discretization of Continuous Attributes for Learning Classification Rules , 1999, PAKDD.

[20] Renée J. Miller,et al. Association rules over interval data , 1997, SIGMOD '97.

[21] Jean Ponce,et al. Computer Vision: A Modern Approach , 2002 .

[22] Giovanni Felici,et al. Mathematical Methods for Knowledge Discovery and Data Mining , 2007 .

[23] Ramakrishnan Srikant,et al. Mining quantitative association rules in large relational tables , 1996, SIGMOD '96.

[24] I‐Hsien Ting,et al. Complex Data Warehousing and Knowledge Discovery for Advanced Retrieval Development: Innovative Methods and Applications , 2010 .

[25] Toshihide Ibaraki,et al. Logical analysis of numerical data , 1997, Math. Program..

[26] Marek Kretowski,et al. An Evolutionary Algorithm Using Multivariate Discretization for Decision Rule Induction , 1999, PKDD.

[27] Keki B. Irani,et al. Multi-interval discretization of continuos attributes as pre-processing for classi cation learning , 1993, IJCAI 1993.

[28] Wolfgang Maass,et al. Efficient agnostic PAC-learning with simple hypothesis , 1994, COLT '94.

[29] J. Ross Quinlan,et al. Induction of Decision Trees , 1986, Machine Learning.