When is adaptive better than optimal?

Given a stationary process, let us predict it using a first-order predictor whose single coefficient is adapted to the current observations using a constant gain identification algorithm. We investigate the prediction error variance as a function of the adaptation gain i.e., the length of the memory (the number of observations) of the identification scheme. An infinite-memory corresponds to the asymptotically constant optimal predictor and a finite memory to a locally adaptive time varying predictor. We show that, in some specified situations, the prediction error variance associated with the finite memory adaptation scheme is smaller that the optimal variance. This can only occur if the model is misspecified i.e., the structure of the optimal predictor is too simple. >