A tutorial on kernel methods for categorization

The abilities to learn and to categorize are fundamental for cognitive systems, be it animals or machines, and therefore have attracted attention from engineers and psychologists alike. Modern machine learning methods and psychological models of categorization are remarkably similar, partly because these two fields share a common history in artificial neural networks and reinforcement learning. However, machine learning is now an independent and mature field that has moved beyond psychologically or neurally inspired algorithms towards providing foundations for a theory of learning that is rooted in statistics and functional analysis. Much of this research is potentially interesting for psychological theories of learning and categorization but also hardly accessible for psychologists. Here, we provide a tutorial introduction to a popular class of machine learning tools, called kernel methods. These methods are closely related to perceptrons, radial-basis-function neural networks and exemplar theories of categorization. Recent theoretical advances in machine learning are closely tied to the idea that the similarity of patterns can be encapsulated in a positive definite kernel. Such a positive definite kernel can define a reproducing kernel Hilbert space which allows one to use powerful tools from functional analysis for the analysis of learning algorithms. We give basic explanations of some key concepts—the so-called kernel trick, the representer theorem and regularization—which may open up the possibility that insights from machine learning can feed back into psychology.

[1]  J. Kruschke,et al.  ALCOVE: an exemplar-based connectionist model of category learning. , 1992, Psychological review.

[2]  T. Poggio A theory of how the brain might work. , 1990, Cold Spring Harbor symposia on quantitative biology.

[3]  B. Schölkopf,et al.  Similarity, Kernels, and the Triangle Inequality , 2008 .

[4]  M. Aizerman,et al.  Theoretical Foundations of the Potential Function Method in Pattern Recognition Learning , 1964 .

[5]  Marvin Minsky,et al.  Linearly Unrecognizable Patterns , 1967 .

[6]  B. Silverman,et al.  Functional Data Analysis , 1997 .

[7]  S Edelman,et al.  Representation is representation of similarities , 1996, Behavioral and Brain Sciences.

[8]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986 .

[9]  J. Townsend,et al.  Computational, Geometric, and Process Perspectives on Facial Cognition : Contexts and Challenges , 2005 .

[10]  F. Ashby,et al.  Categorization as probability density estimation , 1995 .

[11]  Jan Drösler Color Similarity Represented as a Metric of Color Space , 1994 .

[12]  Alexander J. Smola,et al.  Learning with kernels , 1998 .

[13]  B. Schölkopf,et al.  Generalization and similarity in exemplar models of categorization: Insights from machine learning , 2008, Psychonomic bulletin & review.

[14]  A. Tversky Features of Similarity , 1977 .

[15]  A. Tversky,et al.  Similarity, separability, and the triangle inequality. , 1982, Psychological review.

[16]  D. Medin,et al.  Linear separability in classification learning. , 1981 .

[17]  Stephen K. Reed,et al.  Pattern recognition and categorization , 1972 .

[18]  R. Shepard,et al.  Learning and memorization of classifications. , 1961 .

[19]  Kurt Hornik,et al.  Multilayer feedforward networks are universal approximators , 1989, Neural Networks.

[20]  K. Holyoak,et al.  Induction of category distributions: a framework for classification learning. , 1984, Journal of experimental psychology. Learning, memory, and cognition.

[21]  Heekuck Oh,et al.  Neural Networks for Pattern Recognition , 1993, Adv. Comput..

[22]  Bernhard Schölkopf,et al.  Classification of Faces in Man and Machine , 2006, Neural Computation.

[23]  M. Posner,et al.  On the genesis of abstract ideas. , 1968, Journal of experimental psychology.

[24]  Bernhard Schölkopf,et al.  Machine Learning Applied to Perception: Decision Images for Gender Classification , 2004, NIPS.

[25]  A. Tversky,et al.  Foundations of multidimensional scaling. , 1968, Psychological review.

[26]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[27]  Audra E. Kosh,et al.  Linear Algebra and its Applications , 1992 .

[28]  R. Nosofsky Relations between exemplar-similarity and likelihood models of classification , 1990 .

[29]  I. J. Schoenberg,et al.  Metric spaces and positive definite functions , 1938 .

[30]  T. Poggio,et al.  The Mathematics of Learning: Dealing with Data , 2005, 2005 International Conference on Neural Networks and Brain.

[31]  F. Girosi,et al.  Networks for approximation and learning , 1990, Proc. IEEE.

[32]  G. Bower,et al.  Evaluating an adaptive network model of human learning , 1988 .

[33]  Bernhard Schölkopf,et al.  Nonlinear Component Analysis as a Kernel Eigenvalue Problem , 1998, Neural Computation.

[34]  Bernhard Schölkopf,et al.  A Generalized Representer Theorem , 2001, COLT/EuroCOLT.

[35]  Tomaso Poggio,et al.  Generalization in vision and motor control , 2004, Nature.

[36]  Gregory Ashby,et al.  Decision rules in the perception and categorization of multidimensional stimuli. , 1988, Journal of experimental psychology. Learning, memory, and cognition.

[37]  W. T. Maddox,et al.  Relations between prototype, exemplar, and decision bound models of categorization , 1993 .

[38]  I. J. Myung,et al.  Toward a method of selecting among computational models of cognition. , 2002, Psychological review.

[39]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[40]  Gerhard H. Fischer,et al.  "Contributions to Mathematical Psychology, Psychometrics, and Methodology" , 1993 .

[41]  R. Nosofsky Attention, similarity, and the identification-categorization relationship. , 1986, Journal of experimental psychology. General.

[42]  Nello Cristianini,et al.  An Introduction to Support Vector Machines and Other Kernel-based Learning Methods , 2000 .

[43]  J. P. Minda,et al.  Straight talk about linear separability , 1997 .