Distributed representations and nested compositional structure

Distributed representations are attractive for a number of reasons. They offer the possibility of representing concepts in a continuous space, they degrade gracefully with noise, and they can be processed in a parallel network of simple processing elements. However, the problem of representing nested structure in distributed representations has been for some time a prominent concern of both proponents and critics of connectionism (Fodor and Pylyshyn 1988; Smolensky 1990; Hinton 1990). The lack of connectionist representations for complex structure has held back progress in tackling higher-level cognitive tasks such as language understanding and reasoning. In this thesis I review connectionist representations and propose a method for the distributed representation of nested structure, which I call "Holographic Reduced Representations" (HRRs). HRRs provide an implementation of Hinton's (1990) "reduced descriptions". HRRs use circular convolution to associate atomic items, which are represented by vectors. Arbitrary variable bindings, short sequences of various lengths, and predicates can be represented in a fixed-width vector. These representations are items in their own right, and can be used in constructing compositional structures. The noisy reconstructions extracted from convolution memories can be cleaned up by using a separate associative memory that has good reconstructive properties. Circular convolution, which is the basic associative operator for HRRs, can be built into a recurrent neural network. The network can store and produce sequences. I show that neural network learning techniques can be used with circular convolution in order to learn representations for items and sequences. One of the attractions of connectionist representations of compositional structures is the possibility of computing without decomposing structures. I show that it is possible to use dot-product comparisons of HRRs for nested structures to estimate the analogical similarity of the structures. This demonstrates how the surface form of connectionist representations can reflect underlying structural similarity and alignment.

[1]  G. A. Miller THE PSYCHOLOGICAL REVIEW THE MAGICAL NUMBER SEVEN, PLUS OR MINUS TWO: SOME LIMITS ON OUR CAPACITY FOR PROCESSING INFORMATION 1 , 1956 .

[2]  Milton Abramowitz,et al.  Handbook of Mathematical Functions with Formulas, Graphs, and Mathematical Tables , 1964 .

[3]  D. GABOR,et al.  Holographic Model of Temporal Recall , 1968, Nature.

[4]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[5]  Alan R. Jones,et al.  Fast Fourier Transform , 1970, SIGP.

[6]  James A. Anderson,et al.  A theory for the recognition of items from short memorized lists , 1973 .

[7]  Richard A. Roberts,et al.  Signals and linear systems , 1973 .

[8]  Stephen A. Ritz,et al.  Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .

[9]  A. Tversky Features of Similarity , 1977 .

[10]  Ronald J. Brachman,et al.  ON THE EPISTEMOLOGICAL STATUS OF SEMANTIC NETWORKS , 1979 .

[11]  Janet Metcalfe,et al.  A composite holographic associative recall model , 1982 .

[12]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[13]  B. Murdock A Theory for the Storage and Retrieval of Item and Associative Information. , 1982 .

[14]  Bennet B. Murdock,et al.  A distributed memory model for serial-order information. , 1983 .

[15]  Raymond Reiter,et al.  On Inheritance Hierarchies With Exceptions , 1983, AAAI.

[16]  David S. Touretzky,et al.  The Mathematics of Inheritance Systems , 1984 .

[17]  Jon M. Slack A Parsing Architecture Based On Distributed Memory Machines , 1984, COLING.

[18]  Donald Geman,et al.  Stochastic Relaxation, Gibbs Distributions, and the Bayesian Restoration of Images , 1984, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[19]  Ray Pike,et al.  Comparison of convolution and matrix distributed memory systems for associative recall and recognition , 1984 .

[20]  Ray Pike,et al.  Comparison of convolution and matrix distributed memory systems for associative recall and recognition. , 1984 .

[21]  Geoffrey E. Hinton,et al.  A Learning Algorithm for Boltzmann Machines , 1985, Cogn. Sci..

[22]  Geoffrey E. Hinton,et al.  Symbols Among the Neurons: Details of a Connectionist Inference Architecture , 1985, IJCAI.

[23]  B. Murdock Convolution and matrix systems: A reply to Pike. , 1985 .

[24]  Jordan B. Pollack,et al.  Massively Parallel Parsing: A Strongly Interactive Model of Natural Language Interpretation , 1988, Cogn. Sci..

[25]  J. Eich Levels of processing, encoding specificity, elaboration, and CHARM. , 1985, Psychological review.

[26]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[27]  Brian Falkenhainer,et al.  The Structure-Mapping Engine * , 2003 .

[28]  P. Smolensky,et al.  Neural and conceptual interpretation of PDP models , 1986 .

[29]  James L. McClelland,et al.  Mechanisms of Sentence Processing: Assigning Roles to Constituents of Sentences , 1986 .

[30]  James L. McClelland,et al.  PDP models and general issues in cognitive science , 1986 .

[31]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[32]  Ronald Rosenfeld,et al.  Four Capacity Models for Coarse-Coded Symbol Memories , 1987 .

[33]  J N Lee,et al.  Optical implementations of associative networks with versatile adaptive learning capabilities. , 1987, Applied optics.

[34]  David S. Touretzky,et al.  A distributed connectionist representation for concept structures , 1987 .

[35]  S. Pinker,et al.  On language and connectionism: Analysis of a parallel distributed processing model of language acquisition , 1988, Cognition.

[36]  Lokendra Shastri,et al.  Semantic Networks: An Evidential Formalization and Its Connectionist Realization , 1988 .

[37]  Ronald Rosenfeld,et al.  Coarse-Coded Symbol Memories and Their Properties , 1988, Complex Syst..

[38]  Terrence J. Sejnowski,et al.  NETtalk: a parallel network that learns to read aloud , 1988 .

[39]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[40]  Elke U. Weber,et al.  Expectation and variance of item resemblance distributions in a convolution-correction model of distributed memory , 1988 .

[41]  Geoffrey E. Hinton,et al.  A Distributed Connectionist Production System , 1988, Cogn. Sci..

[42]  Jerome A. Feldman,et al.  Connectionist Models and Their Properties , 1982, Cogn. Sci..

[43]  Pentti Kanerva,et al.  Sparse Distributed Memory , 1988 .

[44]  Douglas F. Elliott,et al.  Handbook of Digital Signal Processing: Engineering Applications , 1988 .

[45]  Brian Falkenhainer,et al.  The Structure-Mapping Engine: Algorithm and Examples , 1989, Artif. Intell..

[46]  C. P. Dolan Tensor manipulation networks: connectionist and symbolic approaches to comprehension, learning, and planning , 1989 .

[47]  C. Lee Giles,et al.  Higher Order Recurrent Networks and Grammatical Inference , 1989, NIPS.

[48]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[49]  Charles P. Dolan,et al.  Tensor Product Production System: a Modular Architecture and Representation , 1989 .

[50]  B. Murdock,et al.  Memory for Serial Order , 1989 .

[51]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[52]  M. Humphreys,et al.  Different Ways to Cue a Coherent Memory System: A Theory for Episodic, Semantic, and Procedural Tasks. , 1989 .

[53]  B. Ross Distinguishing Types of Superficial Similarities: Different Effects on the Access and Use of Earlier Problems , 1989 .

[54]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[55]  Geoffrey E. Hinton,et al.  Parallel Models of Associative Memory , 1989 .

[56]  Paul Thagard,et al.  Analogical Mapping by Constraint Satisfaction , 1989, Cogn. Sci..

[57]  Ronald J. Williams,et al.  A Learning Algorithm for Continually Running Fully Recurrent Neural Networks , 1989, Neural Computation.

[58]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[59]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[60]  Jeffrey L. Elman,et al.  Finding Structure in Time , 1990, Cogn. Sci..

[61]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[62]  Geoffrey E. Hinton Mapping Part-Whole Hierarchies into Connectionist Networks , 1990, Artif. Intell..

[63]  David J. Chalmers,et al.  Syntactic Transformations on Distributed Representations , 1990 .

[64]  Michael I. Jordan Attractor dynamics and parallelism in a connectionist sequential machine , 1990 .

[65]  Mark Derthick,et al.  Mundane Reasoning by Settling on a Plausible Model , 1990, Artif. Intell..

[66]  David S. Touretzky,et al.  BoltzCONS: Dynamic Symbol Structures in a Connectionist Network , 1990, Artif. Intell..

[67]  Yann LeCun,et al.  Reverse TDNN: An Architecture For Trajectory Generation , 1991, NIPS.

[68]  Robert L. Goldstone,et al.  Relational similarity and the nonindependence of features in similarity judgments , 1991, Cognitive Psychology.

[69]  Lokendra Shastri,et al.  Rules and Variables in Neural Nets , 1991, Neural Computation.

[70]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[71]  Paul Smolensky,et al.  Distributed Recursive Structure Processing , 1991, SCAI.

[72]  Andrew S. Noetzel,et al.  Forcing Simple Recurrent Neural Networks to Encode Context , 1992 .

[73]  Géraldine Legendre,et al.  Principles for an Integrated Connectionist/Symbolic Theory of Higher Cognition ; CU-CS-600-92 , 1992 .

[74]  Colin Giles,et al.  Learning Context-free Grammars: Capabilities and Limitations of a Recurrent Neural Network with an External Stack Memory (cid:3) , 1992 .

[75]  Jeffrey A. Hadley,et al.  Output and retrieval interference in the missing-number task , 1992, Memory & cognition.

[76]  C. Lee Giles,et al.  Learning and Extracting Finite State Automata with Second-Order Recurrent Neural Networks , 1992, Neural Computation.

[77]  Radford M. Neal Connectionist Learning of Belief Networks , 1992, Artif. Intell..

[78]  Arthur B. Markman,et al.  Analogy-- Watershed or Waterloo? Structural alignment and the development of connectionist models of analogy , 1992, NIPS 1992.

[79]  Raymond L. Watrous,et al.  Induction of Finite-State Languages Using Second-Order Recurrent Networks , 1992, Neural Computation.

[80]  Geoffrey E. Hinton,et al.  Developing Population Codes by Minimizing Description Length , 1993, NIPS.

[81]  Kenneth D. Forbus,et al.  The Roles of Similarity in Transfer: Separating Retrievability From Inferential Soundness , 1993, Cognitive Psychology.

[82]  B B Murdock,et al.  TODAM2: a model for the storage and retrieval of item, associative, and serial-order information. , 1993, Psychological review.

[83]  Bruce J. MacLennan,et al.  Characteristics of connectionist knowledge representation , 1991, Inf. Sci..

[84]  Kenneth D. Forbus,et al.  MAC/FAC: A Model of Similarity-Based Retrieval , 1995, Cogn. Sci..

[85]  Jonathan Baxter,et al.  Learning internal representations , 1995, COLT '95.