On the development of inductive learning algorithms: generating flexible and adaptable concept representations

Vast amount of research in machine learning has focused on creating new algorithms stemming from reenements to existing learning models (e.g., neural nets, decision trees, instance-based learners, rule-based systems, bayesian estimators). This is convenient when a model M already exists exhibiting satisfactory performance over the class of domains of study S. It may occur, however, that no model over S performs adequately; the deenition of a new learning model is then necessary. The construction of a new learning model cannot rely on assumptions common to a process based on reenements alone: that a global evaluation metric is enough to guide the development process (e.g., a comparison based on predictive accuracy alone), and that the strategy of the model is always correct (inherent limitations in the strategy may exist). When a model is deened from scratch, it is recommended to adopt a functional view. This means we must rst deene the set of functionalities the new mechanism will perform (e.g., to cope with feature and class noise, to generate hypotheses amenable to interpretation and analysis, to elucidate feature interaction eeectively, etc.). After implementation, a functional decomposition analysis can assess how well the diierent components in the mechanism are able to carry out such functionalities. In the rst part of this thesis I outline basic steps for the development of new learning models. The outline conveys relevant information to the developer: 1) that the set of goal functionalities must be made clear from the beginning, and that the design and implementation must be oriented to accomplish such functionalities; 2) much can be learned from a functional decomposition analysis; by analyzing each component individually we can detect possible deeciencies; 3) it is vital to propose a design in which all components are made explicit, and thus amenable to modiication or replacement. iv To provide evidence supporting the importance of a functional view in the design of new learning models, I describe two experimental case studies. These studies serve to support the following ideas: that separating an algorithm into its constituent components is helpful to identify the individual contribution of each component; and that components of diierent nature can be integrated to achieve a desired functionality. The second part of this thesis deals with the design and implementation of a new learning algorithm: HCL (Hierarchical Concept Learner). This part has two main goals: to put into practice the basic steps for the development of …

[1]  Steve G. Romaniuk Learning to Learn: Automatic Adaptation of Learning Bias , 1994, AAAI.

[2]  Larry A. Rendell,et al.  Integrating Feature Construction with Multiple Classifiers in Decision Tree Induction , 1997, ICML.

[3]  M. Pazzani,et al.  ID2-of-3: Constructive Induction of M-of-N Concepts for Discriminators in Decision Trees , 1991 .

[4]  Usama M. Fayyad,et al.  Branching on Attribute Values in Decision Tree Generation , 1994, AAAI.

[5]  Eric R. Ziegel,et al.  Data: A Collection of Problems From Many Fields for the Student and Research Worker , 1987 .

[6]  Satosi Watanabe,et al.  Pattern Recognition: Human and Mechanical , 1985 .

[7]  Ming Li,et al.  Ideal MDL and Its Relation To Bayesianism , 1996 .

[8]  Dale Schuurmans,et al.  Characterizing the generalization performance of model selection strategies , 1997, ICML.

[9]  Paul E. Utgoff,et al.  Perceptron Trees : A Case Study in ybrid Concept epresentations , 1999 .

[10]  David H. Wolpert,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996, Neural Computation.

[11]  Larry A. Rendell,et al.  Feature construction: an analytic framework and an application to decision trees , 1990 .

[12]  Ron Kohavi,et al.  Bottom-Up Induction of Oblivious Read-Once Decision Graphs: Strengths and Limitations , 1994, AAAI.

[13]  Ricardo Vilalta,et al.  On the Importance of Change of Representation in Induction , 1996 .

[14]  W. Spears,et al.  For Every Generalization Action, Is There Really an Equal and Opposite Reaction? , 1995, ICML.

[15]  Ricardo Vilalta,et al.  The Value of Lookahead Feature Construction in Decision Tree Induction , 1995 .

[16]  Yen-Wei Chen,et al.  the Back-Propagation Algorithm^ , 1998 .

[17]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[18]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[19]  Thomas G. Dietterich,et al.  A Comparative Study of ID3 and Backpropagation for English Text-to-Speech Mapping , 1990, ML.

[20]  Nada Lavrac,et al.  The Multi-Purpose Incremental Learning System AQ15 and Its Testing Application to Three Medical Domains , 1986, AAAI.

[21]  Christopher J. Matheus,et al.  The Need for Constructive Induction , 1991, ML.

[22]  Bruce G. Buchanan,et al.  Learning Intermediate Concepts in Constructing a Hierarchical Knowledge Base , 1985, IJCAI.

[23]  Douglas H. Fisher,et al.  An Empirical Comparison of ID3 and Back-propagation , 1989, IJCAI.

[24]  T. Sejnowski,et al.  Predicting the secondary structure of globular proteins using neural network models. , 1988, Journal of molecular biology.

[25]  Sholom M. Weiss,et al.  Optimized rule induction , 1993, IEEE Expert.

[26]  Alberto L. Sangiovanni-Vincentelli,et al.  Inferring Reduced Ordered Decision Graphs of Minimum Description Length , 1995, ICML.

[27]  J. R. Quinlan,et al.  Comparing connectionist and symbolic learning methods , 1994, COLT 1994.

[28]  Ming Li,et al.  An Introduction to Kolmogorov Complexity and Its Applications , 2019, Texts in Computer Science.

[29]  A Morgan The importance of change. , 1993, The Florida nurse.

[30]  Eduardo Perez Learning despite complex attribute interaction: an approach based on relational operators , 1997 .

[31]  San Cristóbal Mateo,et al.  The Lack of A Priori Distinctions Between Learning Algorithms , 1996 .

[32]  Leo Breiman,et al.  Bagging Predictors , 1996, Machine Learning.

[33]  Geoffrey I. Webb OPUS: An Efficient Admissible Algorithm for Unordered Search , 1995, J. Artif. Intell. Res..

[34]  Pat Langley,et al.  Elements of Machine Learning , 1995 .

[35]  Chris Thornton,et al.  Parity: The Problem that Won't Go Away , 1996, Canadian Conference on AI.

[36]  Ming Li,et al.  On Prediction by Data Compression , 1997, ECML.

[37]  J. Shavlik,et al.  Extracting Reened Rules from Knowledge-based Neural Networks Keywords: Theory Reenement Integrated Learning Representational Shift Rule Extraction from Neural Networks , 1992 .

[38]  J. Ross Quinlan,et al.  Generating Production Rules from Decision Trees , 1987, IJCAI.

[39]  Paul E. Utgoff,et al.  Shift of bias for inductive concept learning , 1984 .

[40]  Sholom M. Weiss,et al.  An Empirical Comparison of Pattern Recognition, Neural Nets, and Machine Learning Classification Methods , 1989, IJCAI.

[41]  O. Mangasarian,et al.  Multisurface method of pattern separation for medical diagnosis applied to breast cytology. , 1990, Proceedings of the National Academy of Sciences of the United States of America.

[42]  Sholom M. Weiss,et al.  Computer Systems That Learn , 1990 .

[43]  Christian Lebiere,et al.  The Cascade-Correlation Learning Architecture , 1989, NIPS.

[44]  Chris Carter,et al.  Multiple decision trees , 2013, UAI.

[45]  M. Hadzikadic,et al.  Concept Formation by Incremental Conceptual Clustering , 1989, IJCAI.

[46]  J. Ross Quinlan,et al.  C4.5: Programs for Machine Learning , 1992 .

[47]  Jude W. Shavlik,et al.  Interpretation of Artificial Neural Networks: Mapping Knowledge-Based Neural Networks into Rules , 1991, NIPS.

[48]  Leo Breiman,et al.  Stacked regressions , 2004, Machine Learning.

[49]  Larry A. Rendell,et al.  Empirical learning as a function of concept character , 2004, Machine Learning.

[50]  L. Breiman,et al.  Submodel selection and evaluation in regression. The X-random case , 1992 .

[51]  Cullen Schaffer,et al.  A Conservation Law for Generalization Performance , 1994, ICML.

[52]  Satosi Watanabe,et al.  Knowing and guessing , 1969 .

[53]  B. Rost,et al.  Prediction of protein secondary structure at better than 70% accuracy. , 1993, Journal of molecular biology.

[54]  Yoav Freund,et al.  A decision-theoretic generalization of on-line learning and an application to boosting , 1995, EuroCOLT.

[55]  Larry A. Rendell,et al.  Substantial Constructive Induction Using Layered Information Compression: Tractable Feature Formation in Search , 1985, IJCAI.

[56]  Larry A. Rendell,et al.  Lookahead Feature Construction for Learning Hard Concepts , 1993, International Conference on Machine Learning.

[57]  David J. Spiegelhalter,et al.  Machine Learning, Neural and Statistical Classification , 2009 .

[58]  B. Efron Estimating the Error Rate of a Prediction Rule: Improvement on Cross-Validation , 1983 .

[59]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[60]  João Gama,et al.  Characterizing the Applicability of Classification Algorithms Using Meta-Level Learning , 1994, ECML.

[61]  Ron Kohavi,et al.  Bias Plus Variance Decomposition for Zero-One Loss Functions , 1996, ICML.

[62]  Ming Li,et al.  Philosophical Issues in Kolmogorov Complexity , 1992, ICALP.

[63]  R. Michalski,et al.  Learning from Observation: Conceptual Clustering , 1983 .

[64]  Larry A. Rendell Learning Hard Concepts , 1988, EWSL.

[65]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[66]  Ronald L. Rivest,et al.  Learning decision lists , 2004, Machine Learning.

[67]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[68]  Larry A. Rendell,et al.  Improving the Design of Induction Methods by Analyzing Algorithm Functionality and Data-Based Concept Complexity , 1993, IJCAI.

[69]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[70]  Leo Breiman,et al.  Bias, Variance , And Arcing Classifiers , 1996 .

[71]  Ron Rymon An SE-tree based Characterization of the Induction Problem , 1993, ICML.

[72]  Larry A. Rendell,et al.  Global Data Analysis and the Fragmentation Problem in Decision Tree Induction , 1997, ECML.

[73]  Charles Elkan,et al.  Estimating the Accuracy of Learned Concepts , 1993, IJCAI.

[74]  Evon M. O. Abu-Taieh,et al.  Comparative study , 2003, BMJ : British Medical Journal.

[75]  J. Shao Linear Model Selection by Cross-validation , 1993 .

[76]  R. Mike Cameron-Jones,et al.  Oversearching and Layered Search in Empirical Learning , 1995, IJCAI.

[77]  J. Ross Quinlan,et al.  Bagging, Boosting, and C4.5 , 1996, AAAI/IAAI, Vol. 1.

[78]  P. BrazdilLIACC Characterization of Classiication Algorithms , 1995 .

[79]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[80]  Pat Langley,et al.  Improving Efficiency by Learning Intermediate Concepts , 1989, IJCAI.

[81]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[82]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[83]  Yamashita,et al.  Backpropagation algorithm which varies the number of hidden units , 1989 .

[84]  John M. Zelle,et al.  Growing layers of perceptrons: introducing the Extentron algorithm , 1992, [Proceedings 1992] IJCNN International Joint Conference on Neural Networks.

[85]  Ping Zhang On the Distributional Properties of Model Selection Criteria , 1992 .

[86]  Larry A. Rendell,et al.  Learning Despite Concept Variation by Finding Structure in Attribute-based Data , 1996, ICML.