An algorithmic theory of learning: Robust concepts and random projection

We study the phenomenon of cognitive learning from an algorithmic standpoint. How does the brain effectively learn concepts from a small number of examples despite the fact that each example contains a huge amount of information? We provide a novel algorithmic analysis via a model of robust concept learning (closely related to “margin classifiers”), and show that a relatively small number of examples are sufficient to learn rich concept classes. The new algorithms have several advantages—they are faster, conceptually simpler, and resistant to low levels of noise. For example, a robust half-space can be learned in linear time using only a constant number of training examples, regardless of the number of attributes. A general (algorithmic) consequence of the model, that “more robust concepts are easier to learn”, is supported by a multitude of psychological studies.

[1]  Feller William,et al.  An Introduction To Probability Theory And Its Applications , 1950 .

[2]  I. J. Schoenberg,et al.  The Relaxation Method for Linear Inequalities , 1954, Canadian Journal of Mathematics.

[3]  A. A. Mullin,et al.  Principles of neurodynamics , 1962 .

[4]  W. Hoeffding Probability Inequalities for sums of Bounded Random Variables , 1963 .

[5]  J. Isbell Structure of categories , 1966 .

[6]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[7]  Vladimir Vapnik,et al.  Chervonenkis: On the uniform convergence of relative frequencies of events to their probabilities , 1971 .

[8]  Stephen K. Reed,et al.  Perceptual vs conceptual categorization , 1973, Memory & cognition.

[9]  Wayne D. Gray,et al.  Basic objects in natural categories , 1976, Cognitive Psychology.

[10]  Eleanor Rosch,et al.  Principles of Categorization , 1978 .

[11]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, STOC '84.

[12]  W. B. Johnson,et al.  Extensions of Lipschitz mappings into Hilbert space , 1984 .

[13]  N. Littlestone Learning Quickly When Irrelevant Attributes Abound: A New Linear-Threshold Algorithm , 1987, 28th Annual Symposium on Foundations of Computer Science (sfcs 1987).

[14]  Peter Frankl,et al.  The Johnson-Lindenstrauss lemma and the sphericity of some graphs , 1987, J. Comb. Theory B.

[15]  J. G. Pierce,et al.  Geometric Algorithms and Combinatorial Optimization , 2016 .

[16]  David Haussler,et al.  Learnability and the Vapnik-Chervonenkis dimension , 1989, JACM.

[17]  ERIC B. BAUM,et al.  On learning a union of half spaces , 1990, J. Complex..

[18]  Robert E. Schapire,et al.  Efficient distribution-free learning of probabilistic concepts , 1990, Proceedings [1990] 31st Annual Symposium on Foundations of Computer Science.

[19]  Nick Littlestone,et al.  Redundant noisy attributes, attribute errors, and linear-threshold learning using winnow , 1991, COLT '91.

[20]  L. Komatsu Recent views of conceptual structure , 1992 .

[21]  Avrim Blum,et al.  Learning an intersection of k halfspaces over a uniform distribution , 1993, Proceedings of 1993 IEEE 34th Annual Foundations of Computer Science.

[22]  Umesh V. Vazirani,et al.  An Introduction to Computational Learning Theory , 1994 .

[23]  Tom Bylander,et al.  Learning linear threshold functions in the presence of classification noise , 1994, COLT '94.

[24]  Nathan Linial,et al.  The geometry of graphs and some of its algorithmic applications , 1994, Proceedings 35th Annual Symposium on Foundations of Computer Science.

[25]  Santosh S. Vempala,et al.  A random sampling based algorithm for learning the intersection of half-spaces , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[26]  Yoav Freund,et al.  Boosting the margin: A new explanation for the effectiveness of voting methods , 1997, ICML.

[27]  Edith Cohen,et al.  Learning noisy perceptrons by a perceptron in polynomial time , 1997, Proceedings 38th Annual Symposium on Foundations of Computer Science.

[28]  Jon M. Kleinberg,et al.  Two algorithms for nearest-neighbor search in high dimensions , 1997, STOC '97.

[29]  Piotr Indyk,et al.  Approximate nearest neighbors: towards removing the curse of dimensionality , 1998, STOC '98.

[30]  Yoav Freund,et al.  Large Margin Classification Using the Perceptron Algorithm , 1998, COLT' 98.

[31]  John Shawe-Taylor,et al.  Generalization Performance of Support Vector Machines and Other Pattern Classifiers , 1999 .

[32]  Alan M. Frieze,et al.  A Polynomial-Time Algorithm for Learning Noisy Linear Threshold Functions , 1996, Algorithmica.

[33]  Santosh S. Vempala Random projection: a new approach to VLSI layout , 1998, Proceedings 39th Annual Symposium on Foundations of Computer Science (Cat. No.98CB36280).

[34]  Leslie G. Valiant,et al.  A neuroidal architecture for cognitive computation , 1998, ICALP.

[35]  Anupam Gupta,et al.  An elementary proof of the Johnson-Lindenstrauss Lemma , 1999 .

[36]  Santosh S. Vempala,et al.  An algorithmic theory of learning: Robust concepts and random projection , 1999, 40th Annual Symposium on Foundations of Computer Science (Cat. No.99CB37039).

[37]  B. Schölkopf,et al.  Advances in kernel methods: support vector learning , 1999 .

[38]  R. Nosofsky,et al.  Math modeling, neuropsychology, and category learning: Response to B. Knowlton (1999) , 1999, Trends in Cognitive Sciences.

[39]  Nosofsky,et al.  Math modeling, neuropsychology, and category learning: , 1999, Trends in cognitive sciences.

[40]  B. Knowlton What can neuropsychology tell us about category learning? , 1999, Trends in Cognitive Sciences.

[41]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[42]  Shai Ben-David,et al.  Limitations of Learning Via Embeddings in Euclidean Half Spaces , 2003, J. Mach. Learn. Res..

[43]  Dimitris Achlioptas,et al.  Database-friendly random projections , 2001, PODS.

[44]  Dan Roth,et al.  On generalization bounds, projection profile, and margin distribution , 2002, ICML.

[45]  Dan Roth,et al.  Margin Distribution and Learning , 2003, ICML.

[46]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[47]  Santosh S. Vempala,et al.  On Kernels, Margins, and Low-Dimensional Mappings , 2004, ALT.

[48]  Rocco A. Servedio,et al.  Learning intersections of halfspaces with a margin , 2004, J. Comput. Syst. Sci..

[49]  Lisa M. Oakes,et al.  Early Category and Concept Development: Making Sense of the Blooming, Buzzing Confusion , 2008 .