Learning and Generalization with Bounded Order Rule Sets

All current rule-based methods found in the literature use some form of heuristic(s) in order to limit the size of the rule search space examined by the learning algorithm. This paper is an attempt to determine (mainly from an empirical standpoint) how generalization performance is affected when certain areas of the rule search space are ignored, as compared to when the entire search space is considered. This is done by exhaustively generating all rules for several small real-world problems and then determining how accuracy decreases as the size of the search space is iteratively reduced. The results show that higherorder rules are not required to approximate many real world learning problems. In dealing with the above question, several methods for inducing rules and using them for classification of novel examples are tested.

[1]  Alberto Maria Segre,et al.  Programs for Machine Learning , 1994 .

[2]  Robert J. Schalkoff,et al.  Pattern recognition - statistical, structural and neural approaches , 1991 .

[3]  Duc Truong Pham,et al.  RULES: A simple rule extraction system , 1995 .

[4]  V. Clark,et al.  Computer-aided multivariate analysis , 1991 .

[5]  H. Hirsh Incremental Version-Space Merging: A General Framework for Concept Learning , 1990 .

[6]  D. Wolpert Combining Generalizers Using Partitions of the Learning Set , 1993 .

[7]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[8]  Tony R. Martinez,et al.  The BBG Rule Induction Algorithm , 1993 .

[9]  Ryszard S. Michalski,et al.  Incremental Generation of VL1 Hypotheses: The Underlying Methodology and the Description of Program AQ11 , 1983 .

[10]  Tony R. Martinez ASOCS: Towards Bridging Neural Network and Artificial Intelligence Learning , 1991 .

[11]  Lawrence O. Hall,et al.  Divide and Conquer Neural Networks , 1993, Neural Networks.

[12]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[13]  David C. Wilkins,et al.  Using apprenticeship techniques to guide constructive induction , 1994 .

[14]  Roberto Battiti,et al.  Democracy in neural nets: Voting schemes for classification , 1994, Neural Networks.

[15]  I. Kononenko,et al.  INDUCTION OF DECISION TREES USING RELIEFF , 1995 .

[16]  G. McLachlan Discriminant Analysis and Statistical Pattern Recognition , 1992 .

[17]  William A. Wallace,et al.  Induction of Rules Subject to a Quality Constraint: Probabilistic Inductive Learning , 1993, IEEE Trans. Knowl. Data Eng..

[18]  Belur V. Dasarathy,et al.  Nearest neighbor (NN) norms: NN pattern classification techniques , 1991 .

[19]  Tony R. Martinez,et al.  The minimum feature set problem , 1994, Neural Networks.

[20]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[21]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[22]  D.E. Goldberg,et al.  Classifier Systems and Genetic Algorithms , 1989, Artif. Intell..

[23]  Tony R. Martinez,et al.  Adaptive Self-Organizing Concurrent Systems , 1990 .

[24]  Tony R. Martinez,et al.  Adaptive self-organizing logic networks , 1986 .

[25]  Tony R. Martinez,et al.  A self-organizing binary decision tree for incrementally defined rule-based systems , 1991, IEEE Trans. Syst. Man Cybern..

[26]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[27]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[28]  Tony R. Martinez,et al.  A Self-Adjusting Dynamic Logic Module , 1991, J. Parallel Distributed Comput..

[29]  S. Usui Neural Computing , 1989, IFIP Congress.

[30]  Tony R. Martinez,et al.  Adaptive Parallel Logic Networks , 1988, J. Parallel Distributed Comput..

[31]  Raymond J. Mooney,et al.  An Experimental Comparison of Symbolic and Connectionist Learning Algorithms , 1989, IJCAI.

[32]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[33]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[34]  Renée Elio,et al.  An incremental deductive strategy for controlling constructive induction in learning from examples , 1991, Machine Learning.

[35]  Sholom M. Weiss,et al.  Optimized rule induction , 1993, IEEE Expert.