Distributed Representations

Given a ̂ network of simple computing elements and some entities to be represented, the most straightforward scheme is to use one computing element for each entity. This is called a local representation. It is easy to understand and easy to implement because the structure of the physical network mirrors the structure of the knowledge it contains. This report describes a different type of representation that is less familiar and harder to think about than local representations. Each entity is represented by a pattern of activity distributed over many computing elements, and each computing element is involved in representing many different entities. The strength of this more complicated kind of representation does not lie in its notational convenience or its ease of implementation in a conventional computer, but rather in the efficiency with which it makes use of the processing abilities of networks of simple, neuron-like computing elements. Every representational scheme has its good and bad points. Distributed representations are no exception. Some desirable properties like content-addressable memory and automatic generalization arise very naturally from the use of patterns of activity as representations. Other properties, like the ability to temporarily store a large set of arbitrary associations, are much harder to achieve. The best psychological evidence for distributed representations is the degree to which their strengths and weaknesses match those of the human mind. ^This research was supported by a grant from the System Development Foundation. I thank Jim Anderson, Dave Ackley Dana Ballard, Francis Crick, Scott Fahlman, Jerry Feldman, Christopher Longuet-Higgins, Don Norman, Terry Sejnowski, and Tim Shallice for helpful discussions. Jay McClelland and Dave Rumelhart helped me refine and rewrite many of the ideas presented here A substantially revised version of this report will appear as a chapter by Hinton, McClelland and Rumelhart in Parallel Distributed Processing: Explorations in the micro-structure of cognition, edited by McClelland and Rumelhart)

[1]  Stephen A. Ritz,et al.  Distinctive features, categorical perception, and probability learning: some applications of a neural model , 1977 .

[2]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[3]  Bennet B. Murdock,et al.  A distributed memory model for serial-order information. , 1983 .

[4]  E. Mark Gold,et al.  Language Identification in the Limit , 1967, Inf. Control..

[5]  M. Kutas,et al.  Event-related brain potentials (ERPs) elicited during rapid serial visual presentation of congruous and incongruous sentences. , 1987, Electroencephalography and clinical neurophysiology. Supplement.

[6]  J. Fodor The Language of Thought , 1980 .

[7]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[8]  G S Dell,et al.  A spreading-activation theory of retrieval in sentence production. , 1986, Psychological review.

[9]  James L. McClelland,et al.  James L. McClelland, David Rumelhart and the PDP Research Group, Parallel distributed processing: explorations in the microstructure of cognition . Vol. 1. Foundations . Vol. 2. Psychological and biological models . Cambridge MA: M.I.T. Press, 1987. , 1989, Journal of Child Language.

[10]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[11]  Joseph Paul Stemberger,et al.  The lexicon in a model of language production , 1985 .

[12]  U. Neisser John Dean's memory: A case study , 1981, Cognition.

[13]  Allan Collins,et al.  A spreading-activation theory of semantic processing , 1975 .

[14]  Geoffrey E. Hinton,et al.  Symbols Among the Neurons: Details of a Connectionist Inference Architecture , 1985, IJCAI.

[15]  T. Givón,et al.  Syntax: A Functional-Typological Introduction , 1982 .

[16]  S. Thompson,et al.  Transitivity in Grammar and Discourse , 1980 .

[17]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[18]  Scott E. Fahlman,et al.  NETL: A System for Representing and Using Real-World Knowledge , 1979, CL.

[19]  Gregg C. Oden,et al.  Semantic constraints and judged preference for interpretations of ambiguous sentences , 1978 .

[20]  M. Kutas,et al.  Reading senseless sentences: brain potentials reflect semantic incongruity. , 1980, Science.

[21]  Daniel G. Bobrow,et al.  Descriptions: An intermediate stage in memory retrieval , 1979, Cognitive Psychology.

[22]  Geoffrey E. Hinton,et al.  Parallel Models of Associative Memory , 1989 .

[23]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[24]  James L. McClelland The Case for Interactionism in Language Processing. , 1987 .

[25]  M. Ross Quillian,et al.  4 – Semantic Memory , 1988 .

[26]  Allen Newell,et al.  Physical Symbol Systems , 1980, Cogn. Sci..

[27]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[28]  J. Eich A composite holographic associative recall model. , 1982 .

[29]  Scott E. Fahlman,et al.  The hashnet interconnection scheme , 1980 .

[30]  A. Salasoo,et al.  Interaction of Knowledge Sources in Spoken Word Identification. , 1985, Journal of memory and language.

[31]  Teuvo Kohonen,et al.  Associative memory. A system-theoretical approach , 1977 .

[32]  W. Marslen-Wilson,et al.  The temporal structure of spoken language understanding , 1980, Cognition.

[33]  James L. McClelland,et al.  PDP models and general issues in cognitive science , 1986 .

[34]  Jerome A. Feldman,et al.  Connectionist Models and Their Properties , 1982, Cogn. Sci..