Empirical learning as a function of concept character

Concept learning depends on data character. To discover how, some researchers have used theoretical analysis to relate the behavior of idealized learning algorithms to classes of concepts. Others have developed pragmatic measures that relate the behavior of empirical systems such as ID3 and PLS1 to the kinds of concepts encountered in practice. But before learning behavior can be predicted, concepts and data must be characterized. Data characteristics include their number, error, “size”, and so forth. Although potential characteristics are numerous, they are constrained by the way one views concepts. Viewing concepts asfunctions over instance space leads to geometric characteristics such as concept size (the proportion of positive instances) and concentration (not too many “peaks”). Experiments show that some of these characteristics drastically affect the accuracy of concept learning. Sometimes data characteristics interact in non-intuitive ways; for example, noisy data may degrade accuracy differently depending on the size of the concept. Compared with effects of some data characteristics, the choice of learning algorithm appears less important: performance accuracy is degraded only slightly when the splitting criterion is replaced with random selection. Analyzing such observations suggests directions for concept learning research.

[1]  Herbert A. Simon,et al.  Problem solving and rule induction: A unified view. , 1974 .

[2]  Larry A. Rendell,et al.  Constructive Induction On Decision Trees , 1989, IJCAI.

[3]  Ryszard S. Michalski,et al.  A Theory and Methodology of Inductive Learning , 1983, Artificial Intelligence.

[4]  Josef Kittler,et al.  Pattern recognition : a statistical approach , 1982 .

[5]  M. Kearns,et al.  Recent Results on Boolean Concept Learning , 1987 .

[6]  A. Lynn Abbott,et al.  Cohesion methods in inductive learning , 1987, Comput. Intell..

[7]  Matjaz Gams,et al.  Review of Five Empirical Learning Systems Within a Proposed Schemata , 1987, EWSL.

[8]  J. R. Quinlan DECISION TREES AS PROBABILISTIC CLASSIFIERS , 1987 .

[9]  Pat Langley,et al.  A general theory of discrimination learning , 1987 .

[10]  G. Lakoff,et al.  Metaphors We Live by , 1981 .

[11]  Thomas G. Dietterich,et al.  Learning and Inductive Inference , 1982 .

[12]  J. Ross Quinlan,et al.  Learning Efficient Classification Procedures and Their Application to Chess End Games , 1983 .

[13]  F. Downton,et al.  Introduction to Mathematical Statistics , 1959 .

[14]  Larry A. Rendell Comparing Systems and analyzing Functions to Improve Constructive Induction , 1989, ML.

[15]  Leslie G. Valiant,et al.  Computational limitations on learning from examples , 1988, JACM.

[16]  George Drastal,et al.  Induction in an Abstraction Space: A Form of Constructive Induction , 1989, IJCAI.

[17]  David Haussler,et al.  Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework , 1988, Artif. Intell..

[18]  HausslerDavid,et al.  A general lower bound on the number of examples needed for learning , 1989 .

[19]  Arthur L. Samuel,et al.  Some Studies in Machine Learning Using the Game of Checkers , 1967, IBM J. Res. Dev..

[20]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[21]  Larry A. Rendell,et al.  Substantial Constructive Induction Using Layered Information Compression: Tractable Feature Formation in Search , 1985, IJCAI.

[22]  Julius T. Tou,et al.  Pattern Recognition Principles , 1974 .

[23]  Larry A. Rendell Learning Hard Concepts , 1988, EWSL.

[24]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[25]  Michael R. Anderberg,et al.  Cluster Analysis for Applications , 1973 .

[26]  Leslie G. Valiant,et al.  A general lower bound on the number of examples needed for learning , 1988, COLT '88.

[27]  Larry A. Rendell,et al.  Improving the design of similarity-based rule-learning systems , 1989 .

[28]  Tom Michael Mitchell,et al.  Explanation-based generalization: A unifying view , 1986 .

[29]  J. Ross Quinlan,et al.  Simplifying Decision Trees , 1987, Int. J. Man Mach. Stud..

[30]  Thomas M. Cover,et al.  Geometrical and Statistical Properties of Systems of Linear Inequalities with Applications in Pattern Recognition , 1965, IEEE Trans. Electron. Comput..

[31]  Larry A. Rendell,et al.  A New Basis for State-Space Learning Systems and a Successful Implementation , 1983, Artif. Intell..

[32]  Giulia Pagallo,et al.  Learning DNF by Decision Trees , 1989, IJCAI.

[33]  Larry A. Rendell,et al.  Learning hard concepts through constructive induction: framework and rationale , 1990, Comput. Intell..

[34]  Derek H. Sleeman A Rule-Based Task Generation System , 1981, IJCAI.

[35]  Robert C. Holte,et al.  Concept Learning and the Problem of Small Disjuncts , 1989, IJCAI.

[36]  Tom Michael Mitchell Version spaces: an approach to concept learning. , 1979 .

[37]  Philip J. Stone,et al.  Experiments in induction , 1966 .

[38]  J. R. Quinlan Discovering rules by induction from large collections of examples Intro-ductory readings in expert s , 1979 .

[39]  Larry A. Rendell,et al.  Induction, of and by Probability , 1985, UAI.