Empirical Learning as a Function of Concept Character

Concept learning depends on data character. To discover how, some researchers have used theoretical analysis to relate the behavior of idealized learning algorithms to classes of concepts. Others have developed pragmatic measures that relate the behavior of empirical systems such as ID3 and PLS1 to the kinds of concepts encountered in practice. But before learning behavior can be predicted, concepts and data must be characterized. Data characteristics include their number, error, “size,” and so forth. Although potential characteristics are numerous, they are constrained by the way one views concepts. Viewing concepts as functions over instance space leads to geometric characteristics such as concept size (the proportion of positive instances) and concentration (not too many “peaks”). Experiments show that some of these characteristics drastically affect the accuracy of concept learning. Sometimes data characteristics interact in non-intuitive ways; for example, noisy data may degrade accuracy differently depending on the size of the concept. Compared with effects of some data characteristics, the choice of learning algorithm appears less important: performance accuracy is degraded only slightly when the splitting criterion is replaced with random selection. Analyzing such observations suggests directions for concept learning research.

[1]  Larry A. Rendell,et al.  Improving the design of similarity-based rule-learning systems , 1989 .

[2]  J. R. Quinlan Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[3]  P. Langley,et al.  Production system models of learning and development , 1987 .

[4]  Patrick Henry Winston,et al.  The psychology of computer vision , 1976, Pattern Recognit..

[5]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[6]  Ryszard S. Michalski,et al.  A theory and methodology of inductive learning , 1993 .

[7]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[8]  Larry A. Rendell,et al.  Substantial Constructive Induction Using Layered Information Compression: Tractable Feature Formation in Search , 1985, IJCAI.

[9]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[10]  Peter Clark,et al.  The CN2 Induction Algorithm , 1989, Machine Learning.

[11]  Herbert A. Simon,et al.  Problem solving and rule induction: A unified view. , 1974 .

[12]  J. R. Quinlan DECISION TREES AS PROBABILISTIC CLASSIFIERS , 1987 .

[13]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[14]  Tom Michael Mitchell,et al.  Explanation-based generalization: A unifying view , 1986 .

[15]  M. Kearns,et al.  Recent Results on Boolean Concept Learning , 1987 .

[16]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[17]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[18]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[19]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[20]  L. Rendell A General Framework for Induction and a Study of Selective Induction , 1986, Machine Learning.

[21]  Pat Langley,et al.  A general theory of discrimination learning , 1987 .

[22]  Robert V. Hogg,et al.  Introduction to Mathematical Statistics. , 1966 .

[23]  G. Lakoff,et al.  Metaphors We Live by , 1981 .

[24]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[25]  L. N. Kanal,et al.  Uncertainty in Artificial Intelligence 5 , 1990 .

[26]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[27]  Erwin Kreyszig,et al.  Introductory Mathematical Statistics. , 1970 .

[28]  N. Draper,et al.  Applied Regression Analysis , 1966 .

[29]  Larry A. Rendell Comparing Systems and analyzing Functions to Improve Constructive Induction , 1989, ML.

[30]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[31]  Derek H. Sleeman A Rule-Based Task Generation System , 1981, IJCAI.

[32]  Thomas G. Dietterich,et al.  Learning and Inductive Inference , 1982 .

[33]  John Mingers,et al.  An Empirical Comparison of Selection Measures for Decision-Tree Induction , 1989, Machine Learning.

[34]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[35]  Lee W. Gregg,et al.  Knowledge and cognition , 1974 .

[36]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[37]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[38]  George Drastal,et al.  Induction in an Abstraction Space: A Form of Constructive Induction , 1989, IJCAI.

[39]  E. Feigenbaum,et al.  Computers and Thought , 1963 .

[40]  HausslerDavid,et al.  A general lower bound on the number of examples needed for learning , 1989 .

[41]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[42]  Larry A. Rendell Learning Hard Concepts , 1988, EWSL.

[43]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[44]  John R. Anderson,et al.  MACHINE LEARNING An Artificial Intelligence Approach , 2009 .

[45]  Shinichi Morishita,et al.  On Classification and Regression , 1998, Discovery Science.

[46]  Larry A. Rendell,et al.  Induction, of and by Probability , 1985, UAI.

[47]  Larry A. Rendell,et al.  A New Basis for State-Space Learning Systems and a Successful Implementation , 1983, Artif. Intell..

[48]  Matjaz Gams,et al.  Review of Five Empirical Learning Systems Within a Proposed Schemata , 1987, EWSL.

[49]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[50]  A. Lynn Abbott,et al.  Cohesion methods in inductive learning , 1987, Comput. Intell..

[51]  Giulia Pagallo,et al.  Learning DNF by Decision Trees , 1989, IJCAI.

[52]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[53]  Avron Barr,et al.  The Handbook of Artificial Intelligence , 1982 .

[54]  F. Downton,et al.  Introduction to Mathematical Statistics , 1959 .