The Evolution of Minimal Catastrophic Forgetting in Neural Systems

The Evolution of Minimal Catastrophic Forgetting in Neural Systems Tebogo Seipone (T.Seipone@cs.bham.ac.uk) School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK John A. Bullinaria (J.A.Bullinaria@cs.bham.ac.uk) School of Computer Science, The University of Birmingham, Birmingham, B15 2TT, UK Abstract It is well known that neural systems can suffer catastrophic forgetting of previously learned patterns when trained on new patterns, and that this renders many cognitive models unrealistic. However, through evolution, humans have arrived at mechanisms which minimize this problem, and so in this paper we aim to show how simulated evolution can be used to generate neural network models with significantly less catastrophic forgetting than traditionally formulated models. Introduction It is normal for humans to gradually forget what they have previously learned, particularly during the learning of new information. However, in traditional artificial neural networks, the forgetting is considerably more catastrophic, and this proves to be a serious limitation of such cognitive models (McCloskey & Cohen, 1989; Ratcliff, 1990; French, 1999). Human brains have presumably evolved by natural selection to minimize this problem. The aim of this paper is to show how simulated evolution can be used to minimize the problem in artificial neural networks too. We shall begin by describing the problem of catastrophic forgetting in more detail, and outline the principal previous approaches to reduce it. We then discuss the possibilities for evolving artificial neural networks, and explain the particular approach that we have adopted for our study. Then, in our largest section, we present a series of simulation results, and compare each of our evolved systems against the baseline of traditionally built systems. We end with some discussion and conclusions. Catastrophic Forgetting After a neural network has been trained on one set of patterns, training on a new set can seriously disrupt, or cause loss of, the previously learned patterns. This ‘catastrophic forgetting’ is a direct result of the stability/ plasticity dilemma which was first investigated by McCloskey & Cohen (1989) and Ratcliff (1990). Since then, various approaches have been studied in an attempt to reduce or eliminate it (French, 1999). Several approaches were based on the idea that if some, or all, of the previous information is re-learned together with any new information, then the network will not forget the old information. This process is called interleaved learning (Ratcliff, 1990; McClelland, McNaughton & O’Reilly, 1995). The need to store the old patterns, which may be impractical (and also rather implausible for cognitive systems), can be avoided by ‘pseudo-rehearsal’ which involves creating pseudo-items (that approximate the original items) to learn with the new items (Robins, 1995). Of course, this still requires storage of the pseudo-items, and it is far from obvious how best to create pseudo-items that represent the old items sufficiently well. Given that the main cause of catastrophic forgetting is interference in the shared weights, many approaches have attempted to reduce that interference. For example, one can restrict the way in which the hidden unit activations are distributed, and hence the connection usage. French (1991) has used activation sharpening algorithms to reduce the hidden unit activation overlaps. The Sharkey & Sharkey (1995) HARM model uses a neural network implementation of a lookup table. This divides the learning task into two sub-tasks; first eliminating the overlap in the input patterns, and then producing appropriate outputs from the hidden nodes. Other approaches have involved allowing two sets of weighted connections between nodes. Hinton & Plaut (1987) used dual-additive weights, with fast weights to learn new patterns and slow weights for long-term storage. Related approaches have been based on the belief that humans do not suffer from catastrophic forgetting because their brains have evolved two distinct areas to deal with the problem (McClelland, McNaughton & O’Reilly, 1995). The hippocampal system deals with learning new information, whilst the neocortical system slowly consolidates that new information with the old for long-term storage, using some form of interleaved learning. Dual- model architectures consisting of two distinct networks, one for early processing and another for long-term storage of previously learnt information, together with an interfacing mechanism, have been developed to simulate this separation (French, 1997; Ans & Rousset, 1997). All these approaches have been based on variations of traditional neural networks, with the designers themselves deciding on the architecture, node activation functions, learning algorithms, the various parameter values, and so on. In this paper, we aim to show how evolutionary computation techniques can be used to evolve neural systems that suffer less catastrophic interference than traditionally built systems.

[1]  John A. Bullinaria Evolving efficient learning algorithms for binary mappings , 2003, Neural Networks.

[2]  X. Yao Evolving Artificial Neural Networks , 1999 .

[3]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[4]  L’oubli catastrophique it,et al.  Avoiding catastrophic forgetting by coupling two reverberating neural networks , 2004 .

[5]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[6]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[7]  Robert M. French,et al.  Using Semi-Distributed Representations to Overcome Catastrophic Forgetting in Connectionist Networks , 1991 .

[8]  Robert M. French,et al.  Pseudo-recurrent Connectionist Networks: An Approach to the 'Sensitivity-Stability' Dilemma , 1997, Connect. Sci..

[9]  Robert Dale,et al.  Proceedings of the Thirteenth Annual Conference of the Cognitive Science Society , 1991 .

[10]  Noel E. Sharkey,et al.  An Analysis of Catastrophic Interference , 1995, Connect. Sci..

[11]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[12]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[13]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.