Pasting Bites Together For Prediction In Large Data Sets And On-Line

The size of many data bases have grown to the point where they cannot fit into the fast memory of even large memory machines, to say nothing of current workstations. If what we want to do is to use these data bases to construct predictions of various characteristics, then since the usual methods require that all data be held in fast memory, various work-arounds have to be used. This paper studies one such class of methods which give accuracy comparable to that which could have been obtained if all data could have been held in core and which are computationally fast. The procedure takes small bites of the data, grows a predictor on each small bite and then pastes these predictors together. The methods are also applicable to on-line learning.