Concept learning as motor program induction: A large-scale empirical study

Concept learning as motor program induction: A large-scale empirical study Brenden M. Lake Ruslan Salakhutdinov Joshua B. Tenenbaum Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Department of Statistics University of Toronto Department of Brain and Cognitive Sciences Massachusetts Institute of Technology Abstract more structured representations that can generalize in deeper and more flexible ways. Concepts have been characterized in terms of “intuitive theories,” which are mental explana- tions that underly a concept (e.g., Murphy & Medin, 1985), or “structural description” models, which are compositional representations based on parts and relations (e.g., Winston, 1975; Hummel & Biederman, 1992). In the latter framework, the concept “Segway” might be represented as two wheels connected by a platform, which supports a motor, etc. Most recently, research in AI and cognitive science has empha- sized rich generative representations. Concepts like “house” can vary in both the number and configuration of their parts (windows, doors, balconies, etc.), much like the variable syn- tactic structure of language. This has lead researchers to model objects and scenes using generative grammars (Wang et al., 2006; Savova, Jakel, & Tenenbaum, 2009; Zhu, Chen, & Yuille, 2009) or programs (Stuhlmuller, Tenenbaum, & Goodman, 2010). A different tradition has focused more on rapid learning and less on conceptual richness. People can acquire a concept from as little as one positive example, contrasting with early work in psychology and standard machine learning that has focused on learning from many positive and negative exam- ples. Bayesian analyses have shown how one-shot learning can be explained with appropriately constrained hypothesis spaces and priors (Shepard, 1987; Tenenbaum & Griffiths, 2001), but where do these constraints come from? For sim- ple prototype-based representations of concepts, rapid gen- eralization can occur by just sharpening particular dimen- sions or features, as described in theories of attentional learn- ing (Smith, Jones, Landau, Gershkoff-Stowe, & Samuelson, 2002) and overhypotheses in hierarchical Bayesian models (Kemp, Perfors, & Tenenbaum, 2007). From this perspective, prior experience with various object concepts may highlight the most relevant dimensions for whole classes of concepts, like the “shape bias” in learning object names (as opposed to a “color” or “material bias”). It is also possible to learn new features over the course of learning the concepts (Schyns, Goldstone, & Thibaut, 1998), and recent work has combined dimensional sharpening with sophisticated methods for fea- ture learning (Salakhutdinov, Tenenbaum, & Torralba, 2011). Despite these different avenues of progress, we are still far from a satisfying unified account. The models that explain how people learn to perform one-shot learning are restricted to the simplest prototype- or feature-based representations; they have not been developed for more sophisticated repre- sentations of concepts such as structural descriptions, gram- mars, or programs. There are also reasons to suspect that these richer representations would be difficult if not impos- Human concept learning is particularly impressive in two re- spects: the internal structure of concepts can be representation- ally rich, and yet the very same concepts can also be learned from just a few examples. Several decades of research have dramatically advanced our understanding of these two aspects of concepts. While the richness and speed of concept learn- ing are most often studied in isolation, the power of human concepts may be best explained through their synthesis. This paper presents a large-scale empirical study of one-shot con- cept learning, suggesting that rich generative knowledge in the form of a motor program can be induced from just a single example of a novel concept. Participants were asked to draw novel handwritten characters given a reference form, and we recorded the motor data used for production. Multiple drawers of the same character not only produced visually similar draw- ings, but they also showed a striking correspondence in their strokes, as measured by their number, shape, order, and direc- tion. This suggests that participants can infer a rich motor- based concept from a single example. We also show that the motor programs induced by individual subjects provide a pow- erful basis for one-shot classification, yielding far higher accu- racy than state-of-the-art pattern recognition methods based on just the visual form. Keywords: concept learning; one-shot learning; structured representations; program induction The power of human thought derives from the power of our concepts. With the concept “car,” we can classify or even imagine new instances, infer missing or occluded parts, parse an object into its main components (wheels, windows, etc.), reason about a familiar thing in an unfamiliar situation (a car underwater), and even create new compositions of concepts (a car-plane). These abilities to generalize flexibly, to go beyond the data given, suggest that human concepts must be represen- tationally rich. Yet it is remarkable how little data is required to learn a new concept. From just one or a handful of exam- ples, a child can learn a new word and use it appropriately (Carey & Bartlett, 1978; Markman, 1989; Bloom, 2000; Xu & Tenenbaum, 2007). Likewise, after seeing a single “Seg- way” or “iPad,” an adult can grasp the meaning of the word, an ability called “one-shot learning.” A central challenge is thus to explain these two remarkable capacities: what kinds of representations can support such flexible generalizations, and what kinds of learning mechanisms can acquire a new con- cept so quickly? The greater puzzle is putting them together: how can such flexible representations be learned from only one or a few examples? Over the last couple of decades, the cognitive science of concepts has divided into different traditions, focused largely on either the richness of concepts or on learning from sparse data. In contrast to the simple representations popular in early cognitive models (e.g., prototypes; Rosch, Simpson, & Miller, 1976) or conventional machine learning (e.g., sup- port vector machines), one tradition has worked to develop

[1]  A M Liberman,et al.  Perception of the speech code. , 1967, Psychological review.

[2]  W. Montague,et al.  Category norms of verbal items in 56 categories A replication and extension of the Connecticut category norms , 1969 .

[3]  Patrick Henry Winston,et al.  Learning structural descriptions from examples , 1970 .

[4]  J. Goodnow,et al.  “The grammar of action”: Sequence and syntax in children's copying☆ , 1973 .

[5]  E. Rosch,et al.  Structural bases of typicality effects. , 1976 .

[6]  Patrick Henry Winston,et al.  The psychology of computer vision , 1976, Pattern Recognit..

[7]  S. Chiba,et al.  Dynamic programming algorithm optimization for spoken word recognition , 1978 .

[8]  Susan Carey,et al.  Acquiring a Single New Word , 1978 .

[9]  Reuven Y. Rubinstein,et al.  Simulation and the Monte Carlo method , 1981, Wiley series in probability and mathematical statistics.

[10]  J. Freyd,et al.  Representing the dynamics of a static form , 1983, Memory & cognition.

[11]  P. Sommers Drawing and cognition: Innovations, primitives, contour, and space in children's drawings , 1984 .

[12]  D. Medin,et al.  The role of theories in conceptual coherence. , 1985, Psychological review.

[13]  R. Shepard,et al.  Toward a universal law of generalization for psychological science. , 1987, Science.

[14]  Roger N. Shepard,et al.  Toward a Universal Law of Generalization , 1988, Science.

[15]  E. Markman Categorization and naming in children , 1989 .

[16]  I. Biederman,et al.  Dynamic binding in a neural network for shape recognition. , 1992, Psychological review.

[17]  Elie Bienenstock,et al.  Neural Networks and the Bias/Variance Dilemma , 1992, Neural Computation.

[18]  Feldman,et al.  The Structure of Perceptual Categories , 1997, Journal of mathematical psychology.

[19]  Robert L. Goldstone,et al.  The development of features in object concepts , 1998, Behavioral and Brain Sciences.

[20]  M. Minami How Children Learn the Meanings of Words , 2001 .

[21]  J. Tenenbaum,et al.  Generalization, similarity, and Bayesian inference. , 2001, The Behavioral and brain sciences.

[22]  Linda B. Smith,et al.  Object name Learning Provides On-the-Job Training for Attention , 2002, Psychological science.

[23]  Mary P. Harper,et al.  Hierarchical Stochastic Image Grammars for Classification and Segmentation , 2006, IEEE Transactions on Image Processing.

[24]  J. Tenenbaum,et al.  Word learning as Bayesian inference. , 2007, Psychological review.

[25]  J. Tenenbaum,et al.  Bayesian Special Section Learning Overhypotheses with Hierarchical Bayesian Models , 2022 .

[26]  Kenneth D. Forbus,et al.  Incremental Learning of Perceptual Categories for Open-Domain Sketch Recognition , 2007, IJCAI.

[27]  J. Grainger,et al.  Letter perception: from pixels to pandemonium , 2008, Trends in Cognitive Sciences.

[28]  Joshua B. Tenenbaum,et al.  Grammar-based object representations in a scene parsing task , 2009 .

[29]  Geoffrey E. Hinton,et al.  Deep Boltzmann Machines , 2009, AISTATS.

[30]  Charles Kemp,et al.  Abstraction and Relational learning , 2009, NIPS.

[31]  Long Zhu,et al.  Unsupervised Learning of Probabilistic Grammar-Markov Models for Object Categories , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[32]  I. Gilboa The Role of Theories , 2009 .

[33]  Joshua B. Tenenbaum,et al.  Learning Structured Generative Concepts , 2010 .

[34]  Joshua B. Tenenbaum,et al.  One shot learning of simple visual concepts , 2011, CogSci.

[35]  Joshua B. Tenenbaum,et al.  Learning to Learn with Compound HD Models , 2011, NIPS.