Tolerating Concept and Sampling Shift in Lazy Learning Using Prediction Error Context Switching

In their unmodified form, lazy-learning algorithms may have difficulty learning and tracking time-varying input/output function maps such as those that occur in concept shift. Extensions of these algorithms, such as Time-Windowed forgetting (TWF), can permit learning of time-varying mappings by deleting older exemplars, but have decreased classification accuracy when the input-space sampling distribution of the learning set is time-varying. Additionally, TWF suffers from lower asymptotic classification accuracy than equivalent non-forgetting algorithms when the input sampling distributions are stationary. Other shift-sensitive algorithms, such as Locally-Weighted forgetting (LWF) avoid the negative effects of time-varying sampling distributions, but still have lower asymptotic classification in non-varying cases. We introduce Prediction Error Context Switching (PECS) which allows lazy-learning algorithms to have good classification accuracy in conditions having a time-varying function mapping and input sampling distributions, while still maintaining their asymptotic classification accuracy in static tasks. PECS works by selecting and re-activating previously stored instances based on their most recent consistency record. The classification accuracy and active learning set sizes for the above algorithms are compared in a set of learning tasks that illustrate the differing time-varying conditions described above. The results show that the PECS algorithm has the best overall classification accuracy over these differing time-varying conditions, while still having asymptotic classification accuracy competitive with unmodified lazy-learners intended for static environments.

[1]  Gerhard Widmer,et al.  Learning Flexible Concepts from Streams of Examples: FLORA 2 , 1992, ECAI.

[2]  Christopher G. Atkeson,et al.  Using locally weighted regression for robot learning , 1991, Proceedings. 1991 IEEE International Conference on Robotics and Automation.

[3]  Richard Granger,et al.  Incremental Learning from Noisy Data , 1986, Machine Learning.

[4]  D. Kibler,et al.  Instance-based learning algorithms , 2004, Machine Learning.

[5]  Gerhard Widmer,et al.  Effective Learning in Dynamic Environments by Explicit Context Tracking , 1993, ECML.

[6]  Marcos Salganicoff,et al.  Density-Adaptive Learning and Forgetting , 1993, ICML.

[7]  Ronald L. Rivest,et al.  Learning Time-Varying Concepts , 1990, NIPS.

[8]  Philip M. Long,et al.  Tracking drifting concepts using random examples , 1991, Annual Conference Computational Learning Theory.

[9]  Marcos Salganicoff Learning and forgetting for perception-action: a projection pursuit and density adaptive approach , 1992 .

[10]  Rodney A. Brooks,et al.  Learning to Coordinate Behaviors , 1990, AAAI.

[11]  Leslie G. Valiant,et al.  A theory of the learnable , 1984, CACM.

[12]  DAVID G. KENDALL,et al.  Introduction to Mathematical Statistics , 1947, Nature.

[13]  Andrew W. Moore,et al.  Acquisition of Dynamic Control Knowledge for a Robotic Manipulator , 1990, ML.

[14]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[15]  Avrim Blum,et al.  Empirical Support for Winnow and Weighted-Majority Based Algorithms: Results on a Calendar Scheduling Domain , 1995, ICML.

[16]  Andrew W. Moore,et al.  Fast, Robust Adaptive Control by Learning only Forward Models , 1991, NIPS.

[17]  T. Başar,et al.  A New Approach to Linear Filtering and Prediction Problems , 2001 .

[18]  Richard O. Duda,et al.  Pattern classification and scene analysis , 1974, A Wiley-Interscience publication.

[19]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[20]  Jon Louis Bentley,et al.  An Algorithm for Finding Best Matches in Logarithmic Expected Time , 1977, TOMS.