An algorithmic theory of learning: Robust concepts and random projection

We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel analysis for a model of robust concept learning (closely related to "margin classifiers"), and show that a relatively small number of examples are sufficient to learn rich concept classes (including threshold functions, Boolean formulae and polynomial surfaces). As a result, we obtain simple intuitive proofs for the generalization bounds of Support Vector Machines. In addition, the new algorithm has several advantages-they are faster conceptually simpler and highly resistant to noise. For example, a robust half-space can be PAC-learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that "more robust concepts are easier to learn", is supported by a multitude of psychological studies.

[1]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[2]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3]  Tom Bylander,et al.  Learning linear threshold functions in the presence of classification noise , 1994, COLT '94.

[4]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[5]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[6]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[7]  Santosh S. Vempala,et al.  A random sampling based algorithm for learning the intersection of half-spaces , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[8]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[9]  Avrim Blum,et al.  Learning an intersection of k halfspaces over a uniform distribution , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[10]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[11]  J. Isbell Structure of categories , 1966 .

[12]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[13]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[14]  L. Komatsu Recent views of conceptual structure , 1992 .

[15]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[16]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[17]  Santosh S. Vempala Random projection: a new approach to VLSI layout , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[18]  S. Agmon The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[19]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[20]  Stephen K. Reed,et al.  Perceptual vs conceptual categorization , 1973, Memory & cognition.

[21]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[22]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[23]  Leslie G. Valiant,et al.  A neuroidal architecture for cognitive computation , 1998, ICALP.

[24]  L. Lovász,et al.  Geometric Algorithms and Combinatorial Optimization , 1981 .