Deep Bayesian Unsupervised Lifelong Learning

Lifelong Learning (LL) refers to the ability to continually learn and solve new problems with incremental available information over time while retaining previous knowledge. Much attention has been given lately to Supervised Lifelong Learning (SLL) with a stream of labelled data. In contrast, we focus on resolving challenges in Unsupervised Lifelong Learning (ULL) with streaming unlabelled data when the data distribution and the unknown class labels evolve over time. Bayesian framework is natural to incorporate past knowledge and sequentially update the belief with new data. We develop a fully Bayesian inference framework for ULL with a novel end-to-end Deep Bayesian Unsupervised Lifelong Learning (DBULL) algorithm, which can progressively discover new clusters without forgetting the past with unlabelled data while learning latent representations. To efficiently maintain past knowledge, we develop a novel knowledge preservation mechanism via sufficient statistics of the latent representation for raw data. To detect the potential new clusters on the fly, we develop an automatic cluster discovery and redundancy removal strategy in our inference inspired by Nonparametric Bayesian statistics techniques. We demonstrate the effectiveness of our approach using image and text corpora benchmark datasets in both LL and batch settings.

[1]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[2]  Lancelot F. James,et al.  Gibbs Sampling Methods for Stick-Breaking Priors , 2001 .

[3]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[4]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[5]  J. Sethuraman,et al.  Convergence of Dirichlet Measures and the Interpretation of Their Parameter. , 1981 .

[6]  Eric Eaton,et al.  Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[7]  Max Welling,et al.  Auto-Encoding Variational Bayes , 2013, ICLR.

[8]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[9]  Tsuyoshi Murata,et al.  {m , 1934, ACML.

[10]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[11]  Kaushik Roy,et al.  Incremental Learning in Deep Convolutional Neural Networks Using Partial Network Sharing , 2017, IEEE Access.

[12]  Lars Hertel,et al.  Approximate Inference for Deep Latent Gaussian Mixtures , 2016 .

[13]  Eric Eaton,et al.  Active Task Selection for Lifelong Machine Learning , 2013, AAAI.

[14]  Yee Whye Teh,et al.  Continual Unsupervised Representation Learning , 2019, NeurIPS.

[15]  Lei Shu,et al.  Lifelong Learning CRF for Supervised Aspect Extraction , 2017, ACL.

[16]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[17]  Honglak Lee,et al.  An Analysis of Single-Layer Networks in Unsupervised Feature Learning , 2011, AISTATS.

[18]  Huachun Tan,et al.  Variational Deep Embedding: An Unsupervised and Generative Approach to Clustering , 2016, IJCAI.

[19]  Dahua Lin,et al.  Lifelong Learning via Progressive Distillation and Retrospection , 2018, ECCV.

[20]  Erik B. Sudderth,et al.  Memoized Online Variational Inference for Dirichlet Process Mixture Models , 2013, NIPS.

[21]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[22]  Anthony V. Robins,et al.  Catastrophic Forgetting, Rehearsal and Pseudorehearsal , 1995, Connect. Sci..

[23]  Ryan P. Adams,et al.  Composing graphical models with neural networks for structured representations and fast inference , 2016, NIPS.

[24]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .

[25]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[26]  Ali Farhadi,et al.  Unsupervised Deep Embedding for Clustering Analysis , 2015, ICML.

[27]  Max Welling,et al.  Semi-supervised Learning with Deep Generative Models , 2014, NIPS.

[28]  Michael I. Jordan,et al.  Variational inference for Dirichlet process mixtures , 2006 .

[29]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[30]  Kurt Hornik,et al.  Approximation capabilities of multilayer feedforward networks , 1991, Neural Networks.

[31]  Eric Eaton,et al.  Learning Shared Knowledge for Deep Lifelong Learning using Deconvolutional Networks , 2019, IJCAI.

[32]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[34]  Bing Liu,et al.  Lifelong Learning for Sentiment Classification , 2015, ACL.

[35]  Lei Shu,et al.  Lifelong-RL: Lifelong Relaxation Labeling for Separating Entities and Aspects in Opinion Targets , 2016, EMNLP.

[36]  Julia Hirschberg,et al.  V-Measure: A Conditional Entropy-Based External Cluster Evaluation Measure , 2007, EMNLP.

[37]  Richard E. Turner,et al.  Variational Continual Learning , 2017, ICLR.

[38]  Eric P. Xing,et al.  Nonparametric Variational Auto-Encoders for Hierarchical Representation Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[39]  Philip S. Yu,et al.  Lifelong Domain Word Embedding via Meta-Learning , 2018, IJCAI.

[40]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.