A major diierence between human learning and machine learning is that humans can reeect about their own learning behavior and adapt it to typical learning tasks in a given environment. To make some initial theoretical steps towardìntrospective' machine learning, I present { as a thought experiment { a `self-referential' recurrent neural network which can run and actively modify its own weight change algorithm. The network has special input units for observing its own failures and successes. Each of its connections has an address. The network has additional special input and output units for sequentially addressing, analyzing and manipulating all of its own adaptive components (weights), including those weights responsible for addressing, analyzing and manipulating weights. Due to the generality of the architecture, there are no theoretical limits to the sophistication of the modiied weight change algorithms running on the network (except for unavoidable pre-wired time and storage constraints). In theory, the network's weight matrix can learn not only to change itself, but it can also learn the way it changes itself, and the way it changes the way it changes itself | and so on ad innnitum. No endless recursion is involved, however. For one variant of the architecture, I present a simple but general initial reinforcement learning algorithm. For another variant, I derive a more complex exact gradient-based algorithm for supervised sequence learning. A disadvantage of the latter algorithm is its computational complexity per time step which is independent of the sequence length and equals O(n 2 conn lognconn), where nconn is the number of connections. Another disadvantage is the high number of local minima of the unusually complex error surface. The purpose of my thought experiment, however, is not to come up with the most eecient or most practicaìintrospective' or`self-referential' weight change algorithm, but to show that such algorithms are possible at all.
[1]
David Haussler,et al.
Quantifying Inductive Bias: AI Learning Algorithms and Valiant's Learning Framework
,
1988,
Artif. Intell..
[2]
Fernando J. Pineda,et al.
Time Dependent Adaptive Neural Networks
,
1989,
NIPS.
[3]
David J. Chalmers,et al.
The Evolution of Learning: An Experiment in Genetic Connectionism
,
1991
.
[4]
Barak A. Pearlmutter.
Learning State Space Trajectories in Recurrent Neural Networks
,
1989,
Neural Computation.
[5]
J. Schmidhuber.
Reducing the Ratio Between Learning Complexity and Number of Time Varying Variables in Fully Recurrent Nets
,
1993
.
[6]
PAUL J. WERBOS,et al.
Generalization of backpropagation with application to a recurrent gas market model
,
1988,
Neural Networks.
[7]
Paul E. Utgoff,et al.
Shift of bias for inductive concept learning
,
1984
.
[8]
Mark B. Ring.
Incremental Development of Complex Behaviors
,
1991,
ML.
[9]
Jürgen Schmidhuber,et al.
A Fixed Size Storage O(n3) Time Complexity Learning Algorithm for Fully Recurrent Continually Running Networks
,
1992,
Neural Computation.
[10]
Jürgen Schmidhuber,et al.
Learning to Control Fast-Weight Memories: An Alternative to Dynamic Recurrent Networks
,
1992,
Neural Computation.
[11]
Douglas B. Lenat,et al.
Theory Formation by Heuristic Search
,
1983,
Artificial Intelligence.
[12]
Jürgen Schmidhuber,et al.
Reinforcement Learning in Markovian and Non-Markovian Environments
,
1990,
NIPS.