Kalman Filter for Online Classification of Non-Stationary Data

In Online Continual Learning (OCL) a learning system receives a stream of data and sequentially performs prediction and training steps. Important challenges in OCL are concerned with automatic adaptation to the particular non-stationary structure of the data, and with quantification of predictive uncertainty. Motivated by these challenges we introduce a probabilistic Bayesian online learning model by using a (possibly pretrained) neural representation and a state space model over the linear predictor weights. Non-stationarity over the linear predictor weights is modelled using a parameter drift transition density, parametrized by a coefficient that quantifies forgetting. Inference in the model is implemented with efficient Kalman filter recursions which track the posterior distribution over the linear weights, while online SGD updates over the transition dynamics coefficient allows to adapt to the non-stationarity seen in data. While the framework is developed assuming a linear Gaussian model, we also extend it to deal with classification problems and for fine-tuning the deep learning representation. In a set of experiments in multi-class classification using data sets such as CIFAR-100 and CLOC we demonstrate the predictive ability of the model and its flexibility to capture non-stationarity.

[1]  Simo Särkkä,et al.  Bayesian Filtering and Smoothing , 2013, Institute of Mathematical Statistics textbooks.

[2]  Diego de Las Casas,et al.  NEVIS'22: A Stream of 100 Tasks Sampled from 30 Years of Computer Vision Research , 2022, ArXiv.

[3]  Marcus Hutter,et al.  Sequential Learning Of Neural Networks for Prequential MDL , 2022, ICLR.

[4]  Daniele Calandriello,et al.  Information-theoretic Online Memory Selection for Continual Learning , 2022, ICLR.

[5]  Tyler L. Hayes,et al.  Online Continual Learning for Embedded Devices , 2022, CoLLAs.

[6]  Deepak Pathak,et al.  The CLEAR Benchmark: Continual LEArning on Real-World Imagery , 2022, NeurIPS Datasets and Benchmarks.

[7]  Jennifer G. Dy,et al.  Learning to Prompt for Continual Learning , 2021, 2022 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Dilan Görür,et al.  One Pass ImageNet , 2021, ArXiv.

[9]  Vladlen Koltun,et al.  Online Continual Learning with Natural Distribution Shifts: An Empirical Study with Visual Data , 2021, 2021 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Michael S. Bernstein,et al.  On the Opportunities and Risks of Foundation Models , 2021, ArXiv.

[11]  T. Tuytelaars,et al.  New Insights on Reducing Abrupt Representation Change in Online Continual Learning , 2021, International Conference on Learning Representations.

[12]  Kaushik Roy,et al.  Gradient Projection Memory for Continual Learning , 2021, ICLR.

[13]  Hyunwoo J. Kim,et al.  Online Continual Learning in Image Classification: An Empirical Survey , 2021, Neurocomputing.

[14]  Andrei A. Rusu,et al.  Embracing Change: Continual Learning in Deep Neural Networks , 2020, Trends in Cognitive Sciences.

[15]  Elliot J. Crowley,et al.  Bayesian Meta-Learning for the Few-Shot Setting via Deep Kernels , 2020, NeurIPS.

[16]  Xiang Ren,et al.  Gradient Based Memory Editing for Task-Free Continual Learning , 2020, ArXiv.

[17]  Simone Calderara,et al.  Dark Experience for General Continual Learning: a Strong, Simple Baseline , 2020, NeurIPS.

[18]  George J. Pappas,et al.  Online Learning of the Kalman Filter With Logarithmic Regret , 2020, IEEE Transactions on Automatic Control.

[19]  Gunhee Kim,et al.  A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning , 2020, ICLR.

[20]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[21]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[22]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[23]  Larry P. Heck,et al.  Class-incremental Learning via Deep Model Consolidation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[24]  Marc'Aurelio Ranzato,et al.  Continual Learning with Tiny Episodic Memories , 2019, ArXiv.

[25]  Shie Mannor,et al.  On-Line Learning of Linear Dynamical Systems: Exponential Forgetting in Kalman Filters , 2018, AAAI.

[26]  Stefan Wermter,et al.  Continual Lifelong Learning with Neural Networks: A Review , 2018, Neural Networks.

[27]  Yann Ollivier,et al.  The Description Length of Deep Learning models , 2018, NeurIPS.

[28]  Marcus Rohrbach,et al.  Memory Aware Synapses: Learning what (not) to forget , 2017, ECCV.

[29]  Svetlana Lazebnik,et al.  PackNet: Adding Multiple Tasks to a Single Network by Iterative Pruning , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[30]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[31]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[32]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[33]  Andrei A. Rusu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[34]  Elad Hazan Introduction to Online Convex Optimization , 2016, Found. Trends Optim..

[35]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[36]  David A. Shamma,et al.  YFCC100M , 2015, Commun. ACM.

[37]  Shai Shalev-Shwartz,et al.  Online Learning and Online Convex Optimization , 2012, Found. Trends Mach. Learn..

[38]  Marcus Hutter,et al.  Asymptotics of discrete MDL for online prediction , 2005, IEEE Transactions on Information Theory.

[39]  Peter Grünwald,et al.  A tutorial introduction to the minimum description length principle , 2004, ArXiv.

[40]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[41]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[42]  A. Dawid,et al.  Prequential probability: principles and properties , 1999 .

[43]  Lee A. Feldkamp,et al.  Neurocontrol of nonlinear dynamical systems with Kalman filter trained recurrent networks , 1994, IEEE Trans. Neural Networks.

[44]  Philip H. S. Torr,et al.  Real-Time Evaluation in Online Continual Learning: A New Paradigm , 2023, ArXiv.

[45]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[46]  Danil V. Prokhorov,et al.  Enhanced Multi-Stream Kalman Filter Training for Recurrent Networks , 1998 .

[47]  Michael I. Jordan,et al.  Advances in Neural Information Processing Systems 30 , 1995 .

[48]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[49]  Sharad Singhal,et al.  Training Multilayer Perceptrons with the Extende Kalman Algorithm , 1988, NIPS.