Generalization Abilities of Cascade Network Architecture

In [5], a new incremental cascade network architecture has been presented. This paper discusses the properties of such cascade networks and investigates their generalization abilities under the particular constraint of small data sets. The evaluation is done for cascade networks consisting of local linear maps using the Mackey-Glass time series prediction task as a benchmark. Our results indicate that to bring the potential of large networks to bear on the problem of extracting information from small data sets without running the risk of overfitting, deeply cascaded network architectures are more favorable than shallow broad architectures that contain the same number of nodes.