Three Strategies to Rule Induction from Data with Numerical Attributes

Rule induction from data with numerical attributes must be accompanied by discretization. Our main objective was to compare two discretization techniques, both based on cluster analysis, with a new rule induction algorithm called MLEM2, in which discretization is performed simultaneously with rule induction. The MLEM2 algorithm is an extension of the existing LEM2 rule induction algorithm, working correctly only for symbolic attributes and being a part of the LERS data mining system. For the two strategies, based on cluster analysis, rules were induced by the LEM2 algorithm. Our results show that MLEM2 outperformed both strategies based on cluster analysis and LEM2, in terms of complexity (size of rule sets and the total number of conditions) and, more importantly, in terms of error rates.

[1]  Aiko M. Hormann,et al.  Programs for Machine Learning. Part I , 1962, Inf. Control..

[2]  Brian Everitt,et al.  Cluster analysis , 1974 .

[3]  M. Hamburg Statistical analysis for decision making , 1977 .

[4]  B. Everitt,et al.  Cluster Analysis (2nd ed). , 1982 .

[5]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[6]  John H. Holland,et al.  Induction: Processes of Inference, Learning, and Discovery , 1987, IEEE Expert.

[7]  D.E. Goldberg,et al.  Classifier Systems and Genetic Algorithms , 1989, Artif. Intell..

[8]  Jerzy W. Grzymala-Busse,et al.  LERS-A System for Learning from Examples Based on Rough Sets , 1992, Intelligent Decision Support.

[9]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[10]  Roman Słowiński,et al.  Intelligent Decision Support , 1992, Theory and Decision Library.

[11]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.

[12]  Jerzy W. Grzymala-Busse,et al.  Global discretization of continuous attributes as preprocessing for machine learning , 1996, Int. J. Approx. Reason..

[13]  Janusz Zalewski,et al.  Rough sets: Theoretical aspects of reasoning about data , 1996 .

[14]  Jerzy W. Grzymala-Busse,et al.  A New Version of the Rule Induction System LERS , 1997, Fundam. Informaticae.

[15]  Jerzy Stefanowski,et al.  On rough set based approaches to induction of decision rules , 1998 .

[16]  Jerzy W. Grzymala-Busse,et al.  Three discretization methods for rule induction , 2001, Int. J. Intell. Syst..

[17]  Jerzy W. Grzymala-Busse,et al.  A Comparison of Six Discretization Algorithms Used for Prediction of Melanoma , 2002, Intelligent Information Systems.

[18]  J. Grzymala-Busse Data reduction: discretization of numerical attributes , 2002 .

[19]  Jan M. Zytkow,et al.  Handbook of data mining and knowledge discovery , 2002 .