Lifelong machine learning: a paradigm for continuous learning

Lifelong Machine Learning (or Lifelong Learning) is an advanced machine learning paradigm that learns continuously, accumulates the knowledge learned in previous tasks, and uses it to help future learning. In the process, the learner becomes more and more knowledgeable and effective at learning. This learning ability is one of the hallmarks of human intelligence. However, the current dominant machine learning paradigm learns in isolation: given a training dataset, it runs a machine learning algorithm on the dataset to produce a model. It makes no attempt to retain the learned knowledge and use it in future learning. Although this isolated learning paradigm has been very successful, it requires a large number of training examples, and is only suitable for well-defined and narrow tasks. In comparison, we humans can learn effectively with a few examples because we have accumulated so much knowledge in the past which enables us to learn with little data or effort. Lifelong learning aims to achieve this capability. As statistical machine learning matures, it is time to make a major effort to break the isolated learning tradition and to study lifelong learning to bring machine learning to new heights. Applications such as intelligent assistants, chatbots, and physical robots that interact with humans and systems in real-life environments are also calling for such lifelong learning capabilities. Without the ability to accumulate the learned knowledge and use it to learn more knowledge incrementally, a system will probably never be truly intelligent. This book serves as an introductory text and survey to lifelong learning.

[1]  Sridhar Mahadevan,et al.  Manifold alignment using Procrustes analysis , 2008, ICML '08.

[2]  Csaba Szepesvári,et al.  Algorithms for Reinforcement Learning , 2010, Synthesis Lectures on Artificial Intelligence and Machine Learning.

[3]  Eric Eaton,et al.  Active Task Selection for Lifelong Machine Learning , 2013, AAAI.

[4]  Steffen Bickel,et al.  Discriminative learning for differing training and test distributions , 2007, ICML '07.

[5]  Qiang Yang,et al.  Lifelong Machine Learning Systems: Beyond Learning Algorithms , 2013, AAAI Spring Symposium: Lifelong Machine Learning.

[6]  Qiang Yang,et al.  Cross-domain sentiment classification via spectral feature alignment , 2010, WWW '10.

[7]  Chun Chen,et al.  Opinion Word Expansion and Target Extraction through Double Propagation , 2011, CL.

[8]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[9]  Peter Stone,et al.  Transfer Learning via Inter-Task Mappings for Temporal Difference Learning , 2007, J. Mach. Learn. Res..

[10]  Wynne Hsu,et al.  Mining association rules with multiple minimum supports , 1999, KDD '99.

[11]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[12]  Guillermo Sapiro,et al.  Online Learning for Matrix Factorization and Sparse Coding , 2009, J. Mach. Learn. Res..

[13]  Volker Tresp,et al.  Robust multi-task learning with t-processes , 2007, ICML '07.

[14]  Malcolm J. A. Strens,et al.  A Bayesian Framework for Reinforcement Learning , 2000, ICML.

[15]  Hui Li,et al.  Multi-task Reinforcement Learning in Partially Observable Stochastic Environments , 2009, J. Mach. Learn. Res..

[16]  M. Yamamura,et al.  An approach to Lifelong Reinforcement Learning through Multiple Environments , 1998 .

[17]  Manuela M. Veloso,et al.  Learning domain structure through probabilistic policy reuse in reinforcement learning , 2013, Progress in Artificial Intelligence.

[18]  Peter Stone,et al.  Transferring Instances for Model-Based Reinforcement Learning , 2008, ECML/PKDD.

[19]  Daniel L. Silver,et al.  The Parallel Transfer of Task Knowledge Using Dynamic Learning Rates Based on a Measure of Relatedness , 1996, Connect. Sci..

[20]  Hal Daumé,et al.  Learning Task Grouping and Overlap in Multi-task Learning , 2012, ICML.

[21]  Massimiliano Pontil,et al.  Regularized multi--task learning , 2004, KDD.

[22]  Sebastian Thrun,et al.  Lifelong robot learning , 1993, Robotics Auton. Syst..

[23]  Diyi Yang,et al.  Incorporating Word Correlation Knowledge into Topic Modeling , 2015, NAACL.

[24]  Demis Hassabis,et al.  Mastering the game of Go with deep neural networks and tree search , 2016, Nature.

[25]  Alessandro Lazaric,et al.  Bayesian Multi-Task Reinforcement Learning , 2010, ICML.

[26]  John Platt,et al.  Probabilistic Outputs for Support vector Machines and Comparisons to Regularized Likelihood Methods , 1999 .

[27]  Neil D. Lawrence,et al.  Learning to learn with the informative vector machine , 2004, ICML.

[28]  Lei Shu,et al.  Lifelong Learning CRF for Supervised Aspect Extraction , 2017, ACL.

[29]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[30]  Daniel L. Silver,et al.  Context-Sensitive MTL Networks for Machine Lifelong Learning , 2007, FLAIRS Conference.

[31]  Lei Shu,et al.  DOC: Deep Open Classification of Text Documents , 2017, EMNLP.

[32]  Eric Eaton,et al.  Using Task Features for Zero-Shot Knowledge Transfer in Lifelong Learning , 2016, IJCAI.

[33]  Eric Horvitz,et al.  Principles of Lifelong Learning for Predictive User Modeling , 2007, User Modeling.

[34]  Richard S. Sutton,et al.  Reinforcement Learning: An Introduction , 1998, IEEE Trans. Neural Networks.

[35]  Patrick M. Pilarski,et al.  Horde: a scalable real-time architecture for learning knowledge from unsupervised sensorimotor interaction , 2011, AAMAS.

[36]  Philip S. Yu,et al.  Partially Supervised Classification of Text Documents , 2002, ICML.

[37]  Daniel L. Silver,et al.  Consolidation Using Sweep Task Rehearsal: Overcoming the Stability-Plasticity Problem , 2015, Canadian Conference on AI.

[38]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[39]  Mark W. Schmidt,et al.  Fast Optimization Methods for L1 Regularization: A Comparative Study and Two New Approaches , 2007, ECML.

[40]  Lei Shu,et al.  Lifelong-RL: Lifelong Relaxation Labeling for Separating Entities and Aspects in Opinion Targets , 2016, EMNLP.

[41]  Bing Liu,et al.  Mining topics in documents: standing on the shoulders of big data , 2014, KDD.

[42]  Arjun Mukherjee,et al.  Aspect Extraction through Semi-Supervised Modeling , 2012, ACL.

[43]  J. Heckman Sample selection bias as a specification error , 1979 .

[44]  Oren Etzioni,et al.  Strategies for lifelong knowledge extraction from the web , 2007, K-CAP '07.

[45]  Qiang Yang,et al.  A Survey on Transfer Learning , 2010, IEEE Transactions on Knowledge and Data Engineering.

[46]  Anton Schwaighofer,et al.  Learning Gaussian Process Kernels via Hierarchical Bayes , 2004, NIPS.

[47]  Jiawei Han,et al.  Knowledge transfer via multiple model local structure mapping , 2008, KDD.

[48]  Lawrence Carin,et al.  Logistic regression with an auxiliary data source , 2005, ICML.

[49]  Shai Ben-David,et al.  Exploiting Task Relatedness for Mulitple Task Learning , 2003, COLT.

[50]  Xinlei Chen,et al.  Never-Ending Learning , 2012, ECAI.

[51]  Estevam R. Hruschka,et al.  Coupled semi-supervised learning for information extraction , 2010, WSDM '10.

[52]  Kristen Grauman,et al.  Learning with Whom to Share in Multi-task Feature Learning , 2011, ICML.

[53]  Yishay Mansour,et al.  Policy Gradient Methods for Reinforcement Learning with Function Approximation , 1999, NIPS.

[54]  James J. Jiang A Literature Survey on Domain Adaptation of Statistical Classifiers , 2007 .

[55]  Jan Peters,et al.  Policy Search for Motor Primitives in Robotics , 2008, NIPS 2008.

[56]  Sriraam Natarajan,et al.  Transfer in variable-reward hierarchical reinforcement learning , 2008, Machine Learning.

[57]  Avrim Blum,et al.  The Bottleneck , 2021, Monopsony Capitalism.

[58]  Jean-Philippe Vert,et al.  Clustered Multi-Task Learning: A Convex Formulation , 2008, NIPS.

[59]  Daniel L. Silver,et al.  Sequential Consolidation of Learned Task Knowledge , 2004, Canadian Conference on AI.

[60]  Lawrence K. Saul,et al.  Identifying suspicious URLs: an application of large-scale online learning , 2009, ICML '09.

[61]  Anderson Rocha,et al.  Toward Open Set Recognition , 2013, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[62]  Sebastian Thrun,et al.  Explanation-based neural network learning a lifelong learning approach , 1995 .

[63]  Stefan Schaal,et al.  Policy Gradient Methods for Robotics , 2006, 2006 IEEE/RSJ International Conference on Intelligent Robots and Systems.

[64]  Guillermo Sapiro,et al.  Online dictionary learning for sparse coding , 2009, ICML '09.

[65]  Peter Stone,et al.  Transfer Learning for Reinforcement Learning Domains: A Survey , 2009, J. Mach. Learn. Res..

[66]  Eric Eaton,et al.  ELLA: An Efficient Lifelong Learning Algorithm , 2013, ICML.

[67]  Andrew G. Barto,et al.  Autonomous shaping: knowledge transfer in reinforcement learning , 2006, ICML.

[68]  Shigenobu Kobayashi,et al.  Reinforcement Learning by Stochastic Hill Climbing on Discounted Reward , 1995, ICML.

[69]  Wei Wang,et al.  Multi-task deep neural network for multi-label learning , 2013, 2013 IEEE International Conference on Image Processing.

[70]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[71]  Mark Herbster,et al.  Online learning over graphs , 2005, ICML.

[72]  Samir Bouabdallah,et al.  Design and control of quadrotors with application to autonomous flying , 2007 .

[73]  W. Bruce Croft,et al.  LDA-based document models for ad-hoc retrieval , 2006, SIGIR.

[74]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[75]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[76]  Tong Zhang,et al.  A High-Performance Semi-Supervised Learning Method for Text Chunking , 2005, ACL.

[77]  Alan Fern,et al.  Multi-task reinforcement learning: a hierarchical Bayesian approach , 2007, ICML '07.

[78]  Yann LeCun,et al.  Tangent Prop - A Formalism for Specifying Selected Invariances in an Adaptive Network , 1991, NIPS.

[79]  Richard S. Sutton,et al.  Multi-timescale nexting in a reinforcement learning robot , 2011, Adapt. Behav..

[80]  Bart De Schutter,et al.  Reinforcement Learning and Dynamic Programming Using Function Approximators , 2010 .

[81]  Xiaoou Tang,et al.  Facial Landmark Detection by Deep Multi-task Learning , 2014, ECCV.

[82]  Peter Englert,et al.  Multi-task policy search for robotics , 2014, 2014 IEEE International Conference on Robotics and Automation (ICRA).

[83]  Dean A. Pomerleau,et al.  Neural Network Perception for Mobile Robot Guidance , 1993 .

[84]  Jieping Ye,et al.  A convex formulation for learning shared structures from multiple tasks , 2009, ICML '09.

[85]  Jan Peters,et al.  Alignment-based transfer learning for robot models , 2013, The 2013 International Joint Conference on Neural Networks (IJCNN).

[86]  Jieping Ye,et al.  Robust multi-task feature learning , 2012, KDD.

[87]  Tao Wang,et al.  Bayesian sparse sampling for on-line reward optimization , 2005, ICML.

[88]  Shuai Wang,et al.  Learning Cumulatively to Become More Knowledgeable , 2016, KDD.

[89]  Jude W. Shavlik,et al.  Knowledge-Based Artificial Neural Networks , 1994, Artif. Intell..

[90]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[91]  Massimiliano Pontil,et al.  Sparse coding for multitask and transfer learning , 2012, ICML.

[92]  Hongfei Yan,et al.  Jointly Modeling Aspects and Opinions with a MaxEnt-LDA Hybrid , 2010, EMNLP.

[93]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[94]  Bing Liu,et al.  Sentiment Analysis and Opinion Mining , 2012, Synthesis Lectures on Human Language Technologies.

[95]  Rajat Raina,et al.  Self-taught learning: transfer learning from unlabeled data , 2007, ICML '07.

[96]  Lawrence Carin,et al.  Multi-Task Learning for Classification with Dirichlet Process Priors , 2007, J. Mach. Learn. Res..

[97]  D. Bertsekas,et al.  TWO-METRIC PROJECTION METHODS FOR CONSTRAINED OPTIMIZATION* , 1984 .

[98]  N. Shackel Bertrand’s Paradox and the Principle of Indifference* , 2007, Philosophy of Science.

[99]  Geoffrey E. Hinton,et al.  Phoneme recognition using time-delay neural networks , 1989, IEEE Trans. Acoust. Speech Signal Process..

[100]  Bing Liu,et al.  Mining Aspect-Specific Opinion using a Holistic Lifelong Topic Model , 2016, WWW.

[101]  John Blitzer,et al.  Domain Adaptation with Structural Correspondence Learning , 2006, EMNLP.

[102]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[103]  Koby Crammer,et al.  Online Methods for Multi-Domain Learning and Adaptation , 2008, EMNLP.

[104]  Mark B. Ring Child: A First Step Towards Continual Learning , 1998, Learning to Learn.

[105]  Quentin Pleple,et al.  Interactive Topic Modeling , 2013 .

[106]  Jiayu Zhou,et al.  Integrating low-rank and group-sparse structures for robust multi-task learning , 2011, KDD.

[107]  Massimiliano Pontil,et al.  Convex multi-task feature learning , 2008, Machine Learning.

[108]  Daphne Koller,et al.  Learning a meta-level prior for feature relevance from multiple related tasks , 2007, ICML '07.

[109]  Motoaki Kawanabe,et al.  Direct Importance Estimation with Model Selection and Its Application to Covariate Shift Adaptation , 2007, NIPS.

[110]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[111]  Gregor Heinrich,et al.  A Generic Approach to Topic Models , 2009, ECML/PKDD.

[112]  Bing Liu,et al.  Opinion spam and analysis , 2008, WSDM '08.

[113]  Andrew W. Moore,et al.  Reinforcement Learning: A Survey , 1996, J. Artif. Intell. Res..

[114]  Xiaojin Zhu,et al.  Incorporating domain knowledge into topic modeling via Dirichlet Forest priors , 2009, ICML '09.

[115]  Jasha Droppo,et al.  Multi-task learning in deep neural networks for improved phoneme recognition , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[116]  Steven W. Zucker,et al.  On the Foundations of Relaxation Labeling Processes , 1983, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[117]  D. Shepard A two-dimensional interpolation function for irregularly-spaced data , 1968, ACM National Conference.

[118]  Arjun Mukherjee,et al.  Discovering coherent topics using general knowledge , 2013, CIKM.

[119]  Marco Wiering,et al.  Reinforcement Learning , 2014, Adaptation, Learning, and Optimization.

[120]  H. Shimodaira,et al.  Improving predictive inference under covariate shift by weighting the log-likelihood function , 2000 .

[121]  Bianca Zadrozny,et al.  Learning and evaluating classifiers under sample selection bias , 2004, ICML.

[122]  William W. Cohen,et al.  Character-level Analysis of Semi-Structured Documents for Set Expansion , 2009, EMNLP.

[123]  Xiaodong Liu,et al.  Representation Learning Using Multi-Task Deep Neural Networks for Semantic Classification and Information Retrieval , 2015, NAACL.

[124]  Andrew Y. Ng,et al.  Improving Word Representations via Global Context and Multiple Word Prototypes , 2012, ACL.

[125]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[126]  Kang Liu,et al.  Book Review: Sentiment Analysis: Mining Opinions, Sentiments, and Emotions by Bing Liu , 2015, CL.

[127]  Jan Peters,et al.  Policy Gradient Methods , 2010, Encyclopedia of Machine Learning.

[128]  Robert E. Mercer,et al.  The Task Rehearsal Method of Life-Long Learning: Overcoming Impoverished Data , 2002, Canadian Conference on AI.

[129]  Christoph H. Lampert,et al.  A PAC-Bayesian bound for Lifelong Learning , 2013, ICML.

[130]  Anton Schwaighofer,et al.  Learning Gaussian processes from multiple tasks , 2005, ICML.

[131]  Yoshua Bengio,et al.  Word Representations: A Simple and General Method for Semi-Supervised Learning , 2010, ACL.

[132]  Qiang Yang,et al.  Co-clustering based classification for out-of-domain documents , 2007, KDD '07.

[133]  Alexander J. Smola,et al.  Word Features for Latent Dirichlet Allocation , 2010, NIPS.

[134]  Qian Liu,et al.  Automated Rule Selection for Aspect Extraction in Opinion Mining , 2015, IJCAI.

[135]  R. Mike Cameron-Jones,et al.  FOIL: A Midterm Report , 1993, ECML.

[136]  Hal Daumé,et al.  Incorporating Lexical Priors into Topic Models , 2012, EACL.

[137]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[138]  Alexander J. Smola,et al.  Online learning with kernels , 2001, IEEE Transactions on Signal Processing.

[139]  Maja Pantic,et al.  The first facial expression recognition and analysis challenge , 2011, Face and Gesture 2011.

[140]  Marco Maggini,et al.  An EM based training algorithm for cross-language text categorization , 2005, The 2005 IEEE/WIC/ACM International Conference on Web Intelligence (WI'05).

[141]  Qian Liu,et al.  Improving Opinion Aspect Extraction Using Semantic Similarity and Aspect Associations , 2016, AAAI.

[142]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Analysis , 1999, UAI.

[143]  W. R. Thompson ON THE LIKELIHOOD THAT ONE UNKNOWN PROBABILITY EXCEEDS ANOTHER IN VIEW OF THE EVIDENCE OF TWO SAMPLES , 1933 .

[144]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[145]  Yoshua Bengio,et al.  Extracting and composing robust features with denoising autoencoders , 2008, ICML '08.

[146]  ChengXiang Zhai,et al.  Instance Weighting for Domain Adaptation in NLP , 2007, ACL.

[147]  Jason Weston,et al.  A unified architecture for natural language processing: deep neural networks with multitask learning , 2008, ICML '08.

[148]  Doug Downey,et al.  Web-scale information extraction in knowitall: (preliminary results) , 2004, WWW '04.

[149]  Yoshua Bengio,et al.  How transferable are features in deep neural networks? , 2014, NIPS.

[150]  Yifan Gong,et al.  Cross-language knowledge transfer using multilingual deep neural network with shared hidden layers , 2013, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.

[151]  Peter Stone,et al.  Cross-domain transfer for reinforcement learning , 2007, ICML '07.