Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm

Valiant (1984) and others have studied the problem of learning various classes of Boolean functions from examples. Here we discuss incremental learning of these functions. We consider a setting in which the learner responds to each example according to a current hypothesis. Then the learner updates the hypothesis, if necessary, based on the correct classification of the example. One natural measure of the quality of learning in this setting is the number of mistakes the learner makes. For suitable classes of functions, learning algorithms are available that make a bounded number of mistakes, with the bound independent of the number of examples seen by the learner. We present one such algorithm that learns disjunctive Boolean functions, along with variants for learning other classes of Boolean functions. The basic method can be expressed as a linear-threshold algorithm. A primary advantage of this algorithm is that the number of mistakes grows only logarithmically with the number of irrelevant attributes in the examples. At the same time, the algorithm is computationally efficient in both time and space.

[1]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[2]  Saburo Muroga,et al.  Threshold logic and its applications , 1971 .

[3]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[4]  Temple F. Smith Occam's razor , 1980, Nature.

[5]  Tom M. Mitchell,et al.  Generalization as Search , 1982, Artif. Intell..

[6]  Vladimir Vapnik,et al.  Estimation of Dependences Based on Empirical Data: Springer Series in Statistics (Springer Series in Statistics) , 1982 .

[7]  Carl H. Smith,et al.  Inductive Inference: Theory and Methods , 1983, CSUR.

[8]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[9]  Leslie G. Valiant,et al.  Learning Disjunction of Conjunctions , 1985, IJCAI.

[10]  Ranan B. Banerji The Logic of Learning: A Basis for Pattern Recognition and for Improvement of Performance , 1985, Adv. Comput..

[11]  David Haussler,et al.  Classifying learnable geometric concepts with the Vapnik-Chervonenkis dimension , 1986, STOC '86.

[12]  David Haussler Quantifying the Inductive Bias in Concept Learning (Extended Abstract) , 1986, AAAI.

[13]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[14]  Leslie G. Valiant,et al.  On the learnability of Boolean formulae , 1987, STOC.

[15]  David Haussler,et al.  Occam's Razor , 1987, Inf. Process. Lett..

[16]  M. Kearns,et al.  Recent Results on Boolean Concept Learning , 1987 .

[17]  David Haussler,et al.  Predicting {0,1}-functions on randomly drawn points , 1988, COLT '88.

[18]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[19]  B. V. Limaye Uniform convergence , 2018, The Student Mathematical Library.