Tower of Hanoi with Connectionist Networks: Learning New Features

A connectionist system previously used to solve the numerical control task of balancing a pole (Barto, Sutton, and Anderson, 1983; Anderson, 1987) is applied to a Tower of Hanoi puzzle. The connectionist system consists of two networks: an evaluation network that learns an evaluation function of states, and an action network that learns to select actions as a function of the puzzle's state and previous actions. The initial state representation is insufficient–new features must be learned to form a useful evaluation function. Comparisons of methodology are made with Langley's (1985) adaptive production system, SAGE.2.