A Rule-Based Classifier with Accurate and Fast Rule Term Induction for Continuous Attributes

Rule-based classifiers are considered more expressive, human readable and less prone to over-fitting compared with decision trees, especially when there is noise in the data. Furthermore, rule-based classifiers do not suffer from the replicated subtree problem as classifiers induced by top down induction of decision trees (also known as 'Divide and Conquer'). This research explores some recent developments of a family of rulebased classifiers, the Prism family and more particular G-Prism-FB and G-Prism-DB algorithms, in terms of local discretisation methods used to induce rule terms for continuous data. The paper then proposes a new algorithm of the Prism family based on a combination of Gauss Probability Density Distribution (GPDD), InterQuartile Range (IQR) and data transformation methods. This new rule-based algorithm, termed G-Rules-IQR, is evaluated empirically and outperforms other members of the Prism family in execution time, accuracy and tentative accuracy.

[1]  William W. Cohen Fast Effective Rule Induction , 1995, ICML.

[2]  Anil K. Bera,et al.  Efficient tests for normality, homoscedasticity and serial independence of regression residuals , 1980 .

[3]  Randy Kerber,et al.  ChiMerge: Discretization of Numeric Attributes , 1992, AAAI.

[4]  Mohamed Medhat Gaber,et al.  A Scalable Expressive Ensemble Learning Using Random Prism: A MapReduce Approach , 2015, Trans. Large Scale Data Knowl. Centered Syst..

[5]  H. Thode Testing For Normality , 2002 .

[6]  Krzysztof Grąbczewski,et al.  Techniques of Decision Tree Induction , 2014 .

[7]  Max Bramer,et al.  Automatic Induction of Classification Rules from Examples Using N-Prism , 2000 .

[8]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[9]  Max Bramer,et al.  An Information-Theoretic Approach to the Pre-pruning of Classification Rules , 2002, Intelligent Information Processing.

[10]  C. Walck Hand-book on statistical distributions for experimentalists , 1996 .

[11]  Peter Clark,et al.  The CN2 induction algorithm , 2004, Machine Learning.

[12]  Giuseppe Di Fatta,et al.  Computationally Efficient Rule-Based Classification for Continuous Streaming Data , 2014, SGAI Conf..

[13]  Max Bramer,et al.  Principles of Data Mining , 2016, Undergraduate Topics in Computer Science.

[14]  Jadzia Cendrowska,et al.  PRISM: An Algorithm for Inducing Modular Rules , 1987, Int. J. Man Mach. Stud..

[15]  Johannes Fürnkranz,et al.  Foundations of Rule Learning , 2012, Cognitive Technologies.

[16]  Max Bramer,et al.  Computationally efficient induction of classification rules with the PMCRI and J-PMCRI frameworks , 2012, Knowl. Based Syst..

[17]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[18]  Lukasz A. Kurgan,et al.  CAIM discretization algorithm , 2004, IEEE Transactions on Knowledge and Data Engineering.

[19]  Max Bramer,et al.  Towards Expressive Modular Rule Induction for Numerical Attributes , 2016, SGAI Conf..

[20]  Max Bramer,et al.  Improving Modular Classification Rule Induction with G-Prism Using Dynamic Rule Term Boundaries , 2017, SGAI Conf..