Memory Efficient Experience Replay for Streaming Learning

In supervised machine learning, an agent is typically trained once and then deployed. While this works well for static settings, robots often operate in changing environments and must quickly learn new things from data streams. In this paradigm, known as streaming learning, a learner is trained online, in a single pass, from a data stream that cannot be assumed to be independent and identically distributed (iid). Streaming learning will cause conventional deep neural networks (DNNs) to fail for two reasons: 1) they need multiple passes through the entire dataset; and 2) non-iid data will cause catastrophic forgetting. An old fix to both of these issues is rehearsal. To learn a new example, rehearsal mixes it with previous examples, and then this mixture is used to update the DNN. Full rehearsal is slow and memory intensive because it stores all previously observed examples, and its effectiveness for preventing catastrophic forgetting has not been studied in modern DNNs. Here, we describe the ExStream algorithm for memory efficient rehearsal and compare it to alternatives. We find that full rehearsal can eliminate catastrophic forgetting in a variety of streaming learning settings, with ExStream performing well using far less memory and computation.

[1]  Ruoming Jin,et al.  Efficient decision tree construction on streaming data , 2003, KDD '03.

[2]  Giorgio Metta,et al.  iCub World: Friendly Robots Help Building Good Vision Data-Sets , 2013, 2013 IEEE Conference on Computer Vision and Pattern Recognition Workshops.

[3]  Geoff Holmes,et al.  Scalable and efficient multi-label classification for evolving data streams , 2012, Machine Learning.

[4]  Davide Maltoni,et al.  Continuous Learning in Single-Incremental-Task Scenarios , 2018, Neural Networks.

[5]  Longxin Lin Self-Improving Reactive Agents Based on Reinforcement Learning, Planning and Teaching , 2004, Machine Learning.

[6]  Philip S. Yu,et al.  Mining concept-drifting data streams using ensemble classifiers , 2003, KDD '03.

[7]  Philip S. Yu,et al.  A Framework for Clustering Evolving Data Streams , 2003, VLDB.

[8]  Geoff Holmes,et al.  New ensemble methods for evolving data streams , 2009, KDD.

[9]  James L. McClelland,et al.  Why there are complementary learning systems in the hippocampus and neocortex: insights from the successes and failures of connectionist models of learning and memory. , 1995, Psychological review.

[10]  Philip S. Yu,et al.  A Framework for Projected Clustering of High Dimensional Data Streams , 2004, VLDB.

[11]  Bernard Zenko,et al.  Speeding-Up Hoeffding-Based Regression Trees With Options , 2011, ICML.

[12]  Alexander Gepperth,et al.  A Bio-Inspired Incremental Learning Architecture for Applied Perceptual Problems , 2016, Cognitive Computation.

[13]  Geoff Holmes,et al.  MOA: Massive Online Analysis , 2010, J. Mach. Learn. Res..

[14]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[15]  L. Nadel,et al.  Memory consolidation, retrograde amnesia and the hippocampal complex , 1997, Current Opinion in Neurobiology.

[16]  Geoff Hulten,et al.  Mining time-changing data streams , 2001, KDD '01.

[17]  Jinli Cao,et al.  Combination of information entropy and ensemble classification for detecting concept drift in data stream , 2018, ACSW.

[18]  Philip S. Yu,et al.  On demand classification of data streams , 2004, KDD.

[19]  Sudipto Guha,et al.  Clustering Data Streams: Theory and Practice , 2003, IEEE Trans. Knowl. Data Eng..

[20]  Maxim Sviridenko,et al.  An Algorithm for Online K-Means Clustering , 2014, ALENEX.

[21]  Ronald Kemker,et al.  FearNet: Brain-Inspired Model for Incremental Learning , 2017, ICLR.

[22]  Charles Elkan,et al.  Scalability for clustering algorithms revisited , 2000, SKDD.

[23]  Geoff Holmes,et al.  New Options for Hoeffding Trees , 2007, Australian Conference on Artificial Intelligence.

[24]  Thanapat Kangkachit,et al.  SE-Stream: Dimension Projection for Evolution-Based Clustering of High Dimensional Data Streams , 2013, KSE.

[25]  Ricard Gavaldà,et al.  Adaptive Learning from Evolving Data Streams , 2009, IDA.

[26]  Michael Hahsler,et al.  Introduction to stream: An Extensible Framework for Data Stream Clustering Research with R , 2017 .

[27]  Mohamed Medhat Gaber,et al.  A Survey of Classification Methods in Data Streams , 2007, Data Streams - Models and Algorithms.

[28]  Geoff Hulten,et al.  Mining high-speed data streams , 2000, KDD '00.

[29]  James R. Williamson,et al.  Gaussian ARTMAP: A Neural Network for Fast Incremental Learning of Noisy Multidimensional Maps , 1996, Neural Networks.

[30]  Michael Hahsler,et al.  Interface for MOA Stream Clustering Algorithms , 2015 .

[31]  Chrisantha Fernando,et al.  PathNet: Evolution Channels Gradient Descent in Super Neural Networks , 2017, ArXiv.

[32]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[33]  Itamar Arel,et al.  This article has been accepted for inclusion in a future issue of this journal. Content is final as presented, with the exception of pagination. IEEE TRANSACTIONS ON NEURAL NETWORKS AND LEARNING SYSTEMS 1 Ensemble Learning in Fixed Expansion Layer Network , 2022 .

[34]  Michael Hahsler,et al.  Infrastructure for Data Stream Mining , 2015 .

[35]  Christian Sohler,et al.  StreamKM++: A clustering algorithm for data streams , 2010, JEAL.

[36]  Qiang Yang,et al.  Boosting for transfer learning , 2007, ICML '07.

[37]  Derya Birant,et al.  ST-DBSCAN: An algorithm for clustering spatial-temporal data , 2007, Data Knowl. Eng..

[38]  Stephen Grossberg,et al.  Fuzzy ARTMAP: A neural network architecture for incremental supervised learning of analog multidimensional maps , 1992, IEEE Trans. Neural Networks.

[39]  Christian Sohler,et al.  BICO: BIRCH Meets Coresets for k-Means Clustering , 2013, ESA.

[40]  Manfred K. Warmuth,et al.  The Weighted Majority Algorithm , 1994, Inf. Comput..

[41]  Mohamed Medhat Gaber,et al.  Density-Based Projected Clustering of Data Streams , 2012, SUM.

[42]  W. Abraham,et al.  Memory retention – the synaptic stability versus plasticity dilemma , 2005, Trends in Neurosciences.

[43]  Geoffrey E. Hinton Using fast weights to deblur old memories , 1987 .

[44]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[45]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[46]  Davide Maltoni,et al.  CORe50: a New Dataset and Benchmark for Continuous Object Recognition , 2017, CoRL.

[47]  Tian Zhang,et al.  BIRCH: an efficient data clustering method for very large databases , 1996, SIGMOD '96.

[48]  Aoying Zhou,et al.  Density-Based Clustering over an Evolving Data Stream with Noise , 2006, SDM.

[49]  Hans-Peter Kriegel,et al.  Incremental OPTICS: Efficient Computation of Updates in a Hierarchical Cluster Ordering , 2003, DaWaK.

[50]  Ronald Kemker,et al.  Measuring Catastrophic Forgetting in Neural Networks , 2017, AAAI.

[51]  Vasant Honavar,et al.  Learn++: an incremental learning algorithm for supervised neural networks , 2001, IEEE Trans. Syst. Man Cybern. Part C.

[52]  Talel Abdessalem,et al.  Adaptive random forests for evolving data stream classification , 2017, Machine Learning.

[53]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[54]  Pietro Perona,et al.  The Caltech-UCSD Birds-200-2011 Dataset , 2011 .

[55]  Michael Hahsler,et al.  Clustering Data Streams Based on Shared Density between Micro-Clusters , 2016, IEEE Transactions on Knowledge and Data Engineering.

[56]  Ira Assent,et al.  The ClusTree: indexing micro-clusters for anytime stream mining , 2011, Knowledge and Information Systems.

[57]  Tian Zhang,et al.  BIRCH: A New Data Clustering Algorithm and Its Applications , 1997, Data Mining and Knowledge Discovery.

[58]  João Gama,et al.  Decision trees for mining data streams , 2006, Intell. Data Anal..

[59]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[60]  Gail A. Carpenter,et al.  Biased ART: A neural architecture that shifts attention toward previously disregarded features following an incorrect prediction , 2010, Neural Networks.

[61]  Hans-Peter Kriegel,et al.  Density-based Projected Clustering over High Dimensional Data Streams , 2012, SDM.

[62]  Saso Dzeroski,et al.  Learning model trees from evolving data streams , 2010, Data Mining and Knowledge Discovery.

[63]  Christoph H. Lampert,et al.  iCaRL: Incremental Classifier and Representation Learning , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[64]  João Gama,et al.  Ensemble learning for data stream analysis: A survey , 2017, Inf. Fusion.

[65]  Marcus A. Maloof,et al.  Dynamic Weighted Majority: An Ensemble Method for Drifting Concepts , 2007, J. Mach. Learn. Res..

[66]  João Bártolo Gomes,et al.  Scalable real-time classification of data streams with concept drift , 2017, Future Gener. Comput. Syst..

[67]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[68]  Stephen Grossberg,et al.  ARTMAP: supervised real-time learning and classification of nonstationary data by a self-organizing neural network , 1991, [1991 Proceedings] IEEE Conference on Neural Networks for Ocean Engineering.

[69]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Hongzhi Wang,et al.  Life-long learning based on dynamic combination model , 2017, Appl. Soft Comput..

[71]  Jerzy Stefanowski,et al.  Reacting to Different Types of Concept Drift: The Accuracy Updated Ensemble Algorithm , 2014, IEEE Transactions on Neural Networks and Learning Systems.

[72]  Li Tu,et al.  Density-based clustering for real-time stream data , 2007, KDD '07.

[73]  K. Ming Leung,et al.  Learning Vector Quantization , 2017, Encyclopedia of Machine Learning and Data Mining.

[74]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[75]  Geoffrey I. Webb,et al.  Extremely Fast Decision Tree , 2018, KDD.

[76]  Michael McCloskey,et al.  Catastrophic Interference in Connectionist Networks: The Sequential Learning Problem , 1989 .