Representative Task Self-Selection for Flexible Clustered Lifelong Learning

Consider the lifelong machine learning paradigm whose objective is to learn a sequence of tasks depending on previous experiences, e.g., knowledge library or deep network weights. However, the knowledge libraries or deep networks for most recent lifelong learning models are of prescribed size and can degenerate the performance for both learned tasks and coming ones when facing with a new task environment (cluster). To address this challenge, we propose a novel incremental clustered lifelong learning framework with two knowledge libraries: feature learning library and model knowledge library, called Flexible Clustered Lifelong Learning (FCL³). Specifically, the feature learning library modeled by an autoencoder architecture maintains a set of representation common across all the observed tasks, and the model knowledge library can be self-selected by identifying and adding new representative models (clusters). When a new task arrives, our FCL³ model firstly transfers knowledge from these libraries to encode the new task, i.e., effectively and selectively soft-assigning this new task to multiple representative models over feature learning library. Then: 1) the new task with a higher outlier probability will be judged as a new representative, and used to redefine both feature learning library and representative models over time; or 2) the new task with lower outlier probability will only refine the feature learning library. For model optimization, we cast this lifelong learning problem as an alternating direction minimization problem as a new task comes. Finally, we evaluate the proposed framework by analyzing several multitask data sets, and the experimental results demonstrate that our FCL³ model can achieve better performance than most lifelong learning frameworks, even batch clustered multitask learning models.

[1]  Eric Eaton,et al.  Active Task Selection for Lifelong Machine Learning , 2013, AAAI.

[2]  Sebastian Thrun,et al.  Explanation-based neural network learning a lifelong learning approach , 1995 .

[3]  Christoph H. Lampert,et al.  Lifelong Learning with Non-i.i.d. Tasks , 2015, NIPS.

[4]  Jiayu Zhou,et al.  Clustered Multi-Task Learning Via Alternating Structure Optimization , 2011, NIPS.

[5]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[6]  Haibin Yu,et al.  Joint Household Characteristic Prediction via Smart Meter Data , 2019, IEEE Transactions on Smart Grid.

[7]  Matthieu Guillaumin,et al.  From categories to subcategories: Large-scale image classification with partial class label refinement , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[9]  Bing Liu,et al.  Lifelong Learning for Sentiment Classification , 2015, ACL.

[10]  David Isele,et al.  Representations for Continuous Learning , 2017, AAAI.

[11]  Xiaowei Xu,et al.  Clustered Lifelong Learning Via Representative Task Selection , 2018, 2018 IEEE International Conference on Data Mining (ICDM).

[12]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[13]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[14]  Gan Sun,et al.  Active Lifelong Learning With "Watchdog" , 2018, AAAI.

[15]  Yurii Nesterov,et al.  Gradient methods for minimizing composite functions , 2012, Mathematical Programming.

[16]  Yoram Singer,et al.  Efficient projections onto the l1-ball for learning in high dimensions , 2008, ICML '08.

[17]  Haibin Yu,et al.  Lifelong Metric Learning , 2017, IEEE Transactions on Cybernetics.

[18]  Matthew B. Blaschko,et al.  Encoder Based Lifelong Learning , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[20]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[21]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[22]  Andrew Zisserman,et al.  Return of the Devil in the Details: Delving Deep into Convolutional Nets , 2014, BMVC.

[23]  Michael L. Littman,et al.  Policy and Value Transfer in Lifelong Reinforcement Learning , 2018, ICML.

[24]  Lei Shu,et al.  Lifelong Learning CRF for Supervised Aspect Extraction , 2017, ACL.

[25]  Eric Eaton,et al.  Online Multi-Task Learning for Policy Gradient Methods , 2014, ICML.

[26]  Lawrence Carin,et al.  Radial Basis Function Network for Multi-task Learning , 2005, NIPS.

[27]  Shie Mannor,et al.  A Deep Hierarchical Approach to Lifelong Learning in Minecraft , 2016, AAAI.

[28]  Jieping Ye,et al.  Multi-Task Feature Learning Via Efficient l2, 1-Norm Minimization , 2009, UAI.

[29]  Tom Heskes,et al.  Task Clustering and Gating for Bayesian Multitask Learning , 2003, J. Mach. Learn. Res..

[30]  Yusen Zhan,et al.  Scalable lifelong reinforcement learning , 2017, Pattern Recognit..

[31]  Jonathan Baxter,et al.  A Model of Inductive Bias Learning , 2000, J. Artif. Intell. Res..

[32]  Enhong Chen,et al.  Exploiting Task-Feature Co-Clusters in Multi-Task Learning , 2015, AAAI.

[33]  Rabab Kreidieh Ward,et al.  Semi-supervised Stacked Label Consistent Autoencoder for Reconstruction and Analysis of Biomedical Signals , 2017, IEEE Transactions on Biomedical Engineering.

[34]  Sahil Garg,et al.  Neurogenesis-Inspired Dictionary Learning: Online Model Adaption in a Changing World , 2017, IJCAI.

[35]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[36]  Qiang Zhou,et al.  Flexible Clustered Multi-Task Learning by Learning Representative Tasks , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Eric Eaton,et al.  Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[38]  Yun Fu,et al.  Robust Lifelong Multi-task Multi-view Representation Learning , 2018, 2018 IEEE International Conference on Big Knowledge (ICBK).

[39]  Eunho Yang,et al.  Asymmetric multi-task learning based on task relatedness and loss , 2016, ICML 2016.

[40]  Jiebo Luo,et al.  User attribute discovery with missing labels , 2018, Pattern Recognit..

[41]  Sebastian Thrun,et al.  Discovering Structure in Multiple Learning Tasks: The TC Algorithm , 1996, ICML.

[42]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[43]  Bing Liu,et al.  Lifelong machine learning: a paradigm for continuous learning , 2017, Frontiers of Computer Science.

[44]  Yu Zhang,et al.  A Survey on Multi-Task Learning , 2017, IEEE Transactions on Knowledge and Data Engineering.

[45]  Jane You,et al.  Adaptive Manifold Regularized Matrix Factorization for Data Clustering , 2017, IJCAI.

[46]  Jieping Ye,et al.  An accelerated gradient method for trace norm minimization , 2009, ICML '09.

[47]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Cordelia Schmid,et al.  Incremental Learning of Object Detectors without Catastrophic Forgetting , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[49]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[50]  Ali Jalali,et al.  A Dirty Model for Multi-task Learning , 2010, NIPS.

[51]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[52]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[53]  Geoffrey E. Hinton,et al.  Distilling the Knowledge in a Neural Network , 2015, ArXiv.

[54]  Stephen P. Boyd,et al.  Distributed Optimization and Statistical Learning via the Alternating Direction Method of Multipliers , 2011, Found. Trends Mach. Learn..