Holistic processing of hierarchical structures in connectionist networks

Despite the success of connectionist systems to model some aspects of cognition, critics argue that the lack of symbol processing makes them inadequate for modelling high-level cognitive tasks which require the representation and processing of hierarchical structures. In this thesis we investigate four mechanisms for encoding hierarchical structures in distributed representations that are suitable for processing in connectionist systems: Tensor Product Representation, Recursive Auto-Associative Memory (RAAM), Holographic Reduced Representation (HRR), and Binary Spatter Code (BSC). In these four schemes representations of hierarchical structures are either learned in a connectionist network or constructed by means of various mathematical operations from binary or real-value vectors. It is argued that the resulting representations carry structural information without being themselves syntactically structured. The structural information about a represented object is encoded in the position of its representation in a high-dimensional representational space. We use Principal Component Analysis and constructivist networks to show that well-separated clusters consisting of representations for structurally similar hierarchical objects are formed in the representational spaces of RAAMs and HRRs. The spatial structure of HRRs and RAAM representations supports the holistic yet structure-sensitive processing of them. Holistic operations on RAAM representations can be learned by backpropagation networks. However, holistic operators over HRRs, Tensor Products, and BSCs have to be constructed by hand, which is not a desirable situation. We propose two new algorithms for learning holistic transformations of HRRs from examples. These algorithms are able to generalise the acquired knowledge to hierarchical objects of higher complexity than the training examples. Such generalisations exhibit systematicity of a degree which, to our best knowledge, has not yet been achieved by any other comparable learning method. Finally, we outline how a number of holistic transformations can be learned in parallel and applied to representations of structurally different objects. The ability to distinguish and perform a number of different structure-sensitive operations is one step towards a connectionist architecture that is capable of modelling complex high-level cognitive tasks such as natural language processing and logical inference.

[1]  C. Lee Giles,et al.  Extraction, Insertion and Refinement of Symbolic Rules in Dynamically Driven Recurrent Neural Networks , 1993 .

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Dmitri A. Rachkovskij,et al.  Binding and Normalization of Binary Sparse Distributed Representations by Context-Dependent Thinning , 2001, Neural Computation.

[4]  U. Fayyad,et al.  Scaling EM (Expectation Maximization) Clustering to Large Databases , 1998 .

[5]  Joachim Diederich,et al.  Survey and critique of techniques for extracting rules from trained artificial neural networks , 1995, Knowl. Based Syst..

[6]  J. Simonoff Multivariate Density Estimation , 1996 .

[7]  Bernd Fritzke Growing self-organizing networks—history, status quo, and perspectives , 1999 .

[8]  T. Landauer,et al.  A Solution to Plato's Problem: The Latent Semantic Analysis Theory of Acquisition, Induction, and Representation of Knowledge. , 1997 .

[9]  Brian Everitt,et al.  Cluster analysis , 1974 .

[10]  Pentti Kanerva,et al.  Large Patterns Make Great Symbols: An Example of Learning from Example , 1998, Hybrid Neural Systems.

[11]  Tim van Gelder,et al.  Compositionality: A Connectionist Variation on a Classical Theme , 1990, Cogn. Sci..

[12]  Gary F. Marcus,et al.  German Inflection: The Exception That Proves the Rule , 1995, Cognitive Psychology.

[13]  James Alistair Hammerton,et al.  Holistic Computation: Reconstructing a Muddled Concept , 1998, Connect. Sci..

[14]  J. Elman Distributed Representations, Simple Recurrent Networks, And Grammatical Structure , 1991 .

[15]  Geoffrey E. Hinton,et al.  SMEM Algorithm for Mixture Models , 1998, Neural Computation.

[16]  Charles P. Dolan,et al.  Implementing a Connectionist Production System Using Tensor Products ; CU-CS-411-88 , 1988 .

[17]  J. Fodor,et al.  Connectionism and the problem of systematicity: Why Smolensky's solution doesn't work , 1990, Cognition.

[18]  F ROSENBLATT,et al.  The perceptron: a probabilistic model for information storage and organization in the brain. , 1958, Psychological review.

[19]  J. Metcalfe,et al.  Predicting syndromes of amnesia from a composite holographic associative recall/recognition model (CHARM). , 1997, Memory.

[20]  Risto Miikkulainen,et al.  Natural Language Processingwith Modular Neural Networks and Distributed Lexicon , 1991 .

[21]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[22]  G. Marcus Rethinking Eliminative Connectionism , 1998, Cognitive Psychology.

[23]  H. C. LONGUET-HIGGINS,et al.  Non-Holographic Associative Memory , 1969, Nature.

[24]  Risto Miikkulainen,et al.  Natural Language Processing With Modular PDP Networks and Distributed Lexicon , 1991, Cogn. Sci..

[25]  Gert Westermann,et al.  Constructivist neural network models of cognitive development , 2000 .

[26]  Allen Newell,et al.  Computer science as empirical inquiry: symbols and search , 1976, CACM.

[27]  A. Sperduti Labeling Raam , 1994 .

[28]  Patrick Billingsley,et al.  Probability and Measure. , 1986 .

[29]  J J Hopfield,et al.  Neurons with graded response have collective computational properties like those of two-state neurons. , 1984, Proceedings of the National Academy of Sciences of the United States of America.

[30]  J. Knott The organization of behavior: A neuropsychological theory , 1951 .

[31]  Jane Neumann,et al.  Learning holistic transformation of HRR from examples , 2000, KES'2000. Fourth International Conference on Knowledge-Based Intelligent Engineering Systems and Allied Technologies. Proceedings (Cat. No.00TH8516).

[32]  Paul Smolensky,et al.  Distributed Recursive Structure Processing , 1990, SCAI.

[33]  Chris Eliasmith,et al.  Integrating structure and meaning: a distributed model of analogical mapping , 2001, Cogn. Sci..

[34]  Raúl Rojas,et al.  Neural Networks - A Systematic Introduction , 1996 .

[35]  B B Murdock,et al.  TODAM2: a model for the storage and retrieval of item, associative, and serial-order information. , 1993, Psychological review.

[36]  Christopher M. Bishop,et al.  Neural networks for pattern recognition , 1995 .

[37]  Jordan B. Pollack,et al.  Implications of Recursive Distributed Representations , 1988, NIPS.

[38]  P. Kanerva Fully Distributed Representation , 1997 .

[39]  J. Eich A composite holographic associative recall model. , 1982 .

[40]  Douglas S. Blank,et al.  Exploring the Symbolic/Subsymbolic Continuum: A case study of RAAM , 1992 .

[41]  Geoffrey E. Hinton Mapping Part-Whole Hierarchies into Connectionist Networks , 1990, Artif. Intell..

[42]  J. Fodor,et al.  Connectionism and cognitive architecture: A critical analysis , 1988, Cognition.

[43]  Michael I. Jordan Serial Order: A Parallel Distributed Processing Approach , 1997 .

[44]  T. Plate,et al.  Structure Matching And Transformation With Distributed Representations , 1997 .

[45]  Stefan Wermter,et al.  A Novel Modular Neural Architecture for Rule-Based and Similarity-Based Reasoning , 1998, Hybrid Neural Systems.

[46]  Simon Haykin,et al.  Neural Networks: A Comprehensive Foundation , 1998 .

[47]  Geoffrey E. Hinton Tensor Product Variable Binding and the Representation of Symbolic Structures in Connectionist Systems , 1991 .

[48]  Marvin Minsky,et al.  Perceptrons: An Introduction to Computational Geometry , 1969 .

[49]  E. Glasersfeld An Introduction to Radical Constructivism , 1981 .

[50]  L. Shastri,et al.  From simple associations to systematic reasoning: A connectionist representation of rules, variables and dynamic bindings using temporal synchrony , 1993, Behavioral and Brain Sciences.

[51]  Stephen I. Gallant,et al.  Connectionist expert systems , 1988, CACM.

[52]  J. Hair Multivariate data analysis , 1972 .

[53]  D. Touretzky,et al.  Reconstructing Physical Symbol Systems , 1994, Cogn. Sci..

[54]  B. Silverman Density estimation for statistics and data analysis , 1986 .

[55]  Alessandro Sperduti,et al.  Learning Distributed Representations for the Classification of Terms , 1995, IJCAI.

[56]  Jordan B. Pollack,et al.  Recursive Distributed Representations , 1990, Artif. Intell..

[57]  William Bechtel,et al.  The case for connectionism , 1993 .

[58]  James L. McClelland,et al.  On learning the past-tenses of English verbs: implicit rules or parallel distributed processing , 1986 .

[59]  Lars Niklasson,et al.  Can Connectionist Models Exhibit Non-Classical Structure Sensitivity? , 2019, Proceedings of the Sixteenth Annual Conference of the Cognitive Science Society.

[60]  V. Marchman,et al.  U-shaped learning and frequency effects in a multi-layered perception: Implications for child language acquisition , 1991, Cognition.

[61]  Heinz von Foerster,et al.  On Constructing a Reality , 2015, Environmental Design Research.

[62]  E. Hundert The Child's Construction of Reality , 1990 .

[63]  T. Plate A Common Framework for Distributed Representation Schemes for Compositional Structure , 1997 .

[64]  Geoffrey E. Hinton,et al.  A Distributed Connectionist Production System , 1988, Cogn. Sci..

[65]  L. Niklasson Extended encoding/decoding of embedded structures using connectionist networks , 1999 .

[66]  Lars Niklasson,et al.  Structure Sensitivity in Connectionist Models , 1993 .

[67]  Geoffrey E. Hinton,et al.  A Distributed Connectionist Production System , 1988, Cogn. Sci..

[68]  J. Fodor Why there STILL has to be a language of thought , 1990 .

[69]  J. Elman Representation and structure in connectionist models , 1991 .

[70]  D. W. Scott,et al.  Multivariate Density Estimation, Theory, Practice and Visualization , 1992 .

[71]  Andreas Stolcke,et al.  Tree matching with recursive distributed representations , 1992, AAAI Conference on Artificial Intelligence.

[72]  Tony A. Plate,et al.  Analogy retrieval and processing with distributed vector representations , 2000, Expert Syst. J. Knowl. Eng..

[73]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[74]  James L. McClelland,et al.  Learning and Applying Contextual Constraints in Sentence Comprehension , 1990, Artif. Intell..

[75]  Géraldine Legendre,et al.  Distributed Recursive Structure Processing ; CU-CS-514-91 , 1991 .

[76]  Mikael Bodén,et al.  Representing Structure and Structured Representations in Connectionist Networks , 2019, Neural Network Perspectives on Cognition and Adaptive Robotics.

[77]  Geoffrey E. Hinton,et al.  Learning Distributed Representations of Concepts Using Linear Relational Embedding , 2001, IEEE Trans. Knowl. Data Eng..

[78]  David J. Chalmers,et al.  Syntactic Transformations on Distributed Representations , 1990 .

[79]  B. Murdock A Theory for the Storage and Retrieval of Item and Associative Information. , 1982 .

[80]  J J Hopfield,et al.  Neural networks and physical systems with emergent collective computational abilities. , 1982, Proceedings of the National Academy of Sciences of the United States of America.

[81]  Emmanuel Roche,et al.  Finite-State Language Processing , 1997 .

[82]  Geoffrey E. Hinton,et al.  Learning distributed representations of concepts. , 1989 .

[83]  Thomas G. Dietterich,et al.  Editors. Advances in Neural Information Processing Systems , 2002 .

[84]  Steven Phillips,et al.  Are Feedforward and Recurrent Networks Systematic? Analysis and Implications for a Connectionist Cognitive Architecture , 1998, Connect. Sci..

[85]  S. Pinker,et al.  On language and connectionism: Analysis of a parallel distributed processing model of language acquisition , 1988, Cognition.

[86]  C. Malsburg,et al.  How patterned neural connections can be set up by self-organization , 1976, Proceedings of the Royal Society of London. Series B. Biological Sciences.

[87]  Ron Sun,et al.  Robust Reasoning: Integrating Rule-Based and Similarity-Based Reasoning , 1995, Artif. Intell..

[88]  J. Fodor,et al.  The Language of Thought , 1980 .

[89]  J. Pollack,et al.  Infinite RAAM : A Principled Connectionist Substrate for Cog nitive Modeling , 2001 .

[90]  W. Pitts,et al.  A Logical Calculus of the Ideas Immanent in Nervous Activity (1943) , 2021, Ideas That Created the Future.

[91]  Tony A. Plate,et al.  Holographic reduced representations , 1995, IEEE Trans. Neural Networks.

[92]  Geoffrey E. Hinton,et al.  Distributed Representations , 1986, The Philosophy of Artificial Intelligence.

[93]  James L. McClelland,et al.  Mechanisms of Sentence Processing: Assigning Roles to Constituents of Sentences , 1986 .

[94]  Lars Niklasson,et al.  Systematicity and Generalisation in Connectionist Compositional Representations , 1993 .

[95]  Geoffrey E. Hinton,et al.  Distributed representations and nested compositional structure , 1994 .

[96]  T. Gelder,et al.  On Being Systematically Connectionist , 1994 .

[97]  Terrence J. Sejnowski,et al.  Parallel Networks that Learn to Pronounce English Text , 1987, Complex Syst..

[98]  David Willshaw,et al.  Models of distributed associative memory , 1971 .