论文信息 - An algorithmic theory of learning: Robust concepts and random projection

An algorithmic theory of learning: Robust concepts and random projection

We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel analysis for a model of robust concept learning (closely related to "margin classifiers"), and show that a relatively small number of examples are sufficient to learn rich concept classes (including threshold functions, Boolean formulae and polynomial surfaces). As a result, we obtain simple intuitive proofs for the generalization bounds of Support Vector Machines. In addition, the new algorithm has several advantages-they are faster conceptually simpler and highly resistant to noise. For example, a robust half-space can be PAC-learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that "more robust concepts are easier to learn", is supported by a multitude of psychological studies.

Santosh S. Vempala | Rosa I. Arriaga | S. Vempala | R. Arriaga

[1] Marvin Minsky,et al. Perceptrons: An Introduction to Computational Geometry , 1969 .

[2] W. Hoeffding. Probability Inequalities for sums of Bounded Random Variables , 1963 .

[3] Tom Bylander,et al. Learning linear threshold functions in the presence of classification noise , 1994, COLT '94.

[4] Piotr Indyk,et al. Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[5] Nick Littlestone,et al. Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[6] Leslie G. Valiant,et al. A theory of the learnable , 1984, CACM.

[7] Santosh S. Vempala,et al. A random sampling based algorithm for learning the intersection of half-spaces , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[8] Edith Cohen,et al. Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[9] Avrim Blum,et al. Learning an intersection of k halfspaces over a uniform distribution , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[10] W. B. Johnson,et al. Extensions of Lipschitz mappings into Hilbert space , 1984 .

[11] J. Isbell. Structure of categories , 1966 .

[12] N. Littlestone. Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[13] Feller William,et al. An Introduction To Probability Theory And Its Applications , 1950 .

[14] L. Komatsu. Recent views of conceptual structure , 1992 .

[15] Jon M. Kleinberg,et al. Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[16] John Shawe-Taylor,et al. Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[17] Santosh S. Vempala. Random projection: a new approach to VLSI layout , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[18] S. Agmon. The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[19] Alan M. Frieze,et al. A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[20] Stephen K. Reed,et al. Perceptual vs conceptual categorization , 1973, Memory & cognition.

[21] Eleanor Rosch,et al. Principles of Categorization , 1978 .

[22] Vladimir Vapnik,et al. Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[23] Leslie G. Valiant,et al. A neuroidal architecture for cognitive computation , 1998, ICALP.

[24] L. Lovász,et al. Geometric Algorithms and Combinatorial Optimization , 1981 .