Continual Learning on Noisy Data Streams via Self-Purified Replay

Continually learning in the real world must overcome many challenges, among which noisy labels are a common and inevitable issue. In this work, we present a replaybased continual learning framework that simultaneously addresses both catastrophic forgetting and noisy labels for the first time. Our solution is based on two observations; (i) forgetting can be mitigated even with noisy labels via selfsupervised learning, and (ii) the purity of the replay buffer is crucial. Building on this regard, we propose two key components of our method: (i) a self-supervised replay technique named Self-Replay which can circumvent erroneous training signals arising from noisy labeled data, and (ii) the SelfCentered filter that maintains a purified replay buffer via centrality-based stochastic graph ensembles. The empirical results on MNIST, CIFAR-10, CIFAR-100, and WebVision with real-world noise demonstrate that our framework can maintain a highly pure replay buffer amidst noisy streamed data while greatly outperforming the combinations of the state-of-the-art continual learning and noisy label learning methods.

[1]  Phillip Bonacich,et al.  Eigenvector-like measures of centrality for asymmetric relations , 2001, Soc. Networks.

[2]  Gerald Tesauro,et al.  Learning to Learn without Forgetting By Maximizing Transfer and Minimizing Interference , 2018, ICLR.

[3]  Abhinav Gupta,et al.  Learning from Noisy Large-Scale Datasets with Minimal Supervision , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Aditya Krishna Menon,et al.  Does label smoothing mitigate label noise? , 2020, ICML.

[5]  Le Song,et al.  Iterative Learning with Open-set Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[6]  Junnan Li,et al.  Prototypical Contrastive Learning of Unsupervised Representations , 2020, ICLR.

[7]  Xingrui Yu,et al.  Co-teaching: Robust training of deep neural networks with extremely noisy labels , 2018, NeurIPS.

[8]  Xiaohua Zhai,et al.  Self-Supervised GANs via Auxiliary Rotation Loss , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  Binqiang Zhao,et al.  O2U-Net: A Simple Noisy Label Detection Approach for Deep Neural Networks , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[10]  Xingrui Yu,et al.  How does Disagreement Help Generalization against Label Corruption? , 2019, ICML.

[11]  Devraj Mandal,et al.  A Novel Self-Supervised Re-labeling Approach for Training with Noisy Labels , 2020, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[12]  Sebastian Thrun,et al.  Is Learning The n-th Thing Any Easier Than Learning The First? , 1995, NIPS.

[13]  Shai Shalev-Shwartz,et al.  Decoupling "when to update" from "how to update" , 2017, NIPS.

[14]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[15]  Wei Li,et al.  WebVision Database: Visual Learning and Understanding from Web Data , 2017, ArXiv.

[16]  Yoshua Bengio,et al.  Gradient based sample selection for online continual learning , 2019, NeurIPS.

[17]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[18]  Geoffrey E. Hinton,et al.  Big Self-Supervised Models are Strong Semi-Supervised Learners , 2020, NeurIPS.

[19]  Seong Joon Oh,et al.  CutMix: Regularization Strategy to Train Strong Classifiers With Localizable Features , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[20]  Richard S. Sutton,et al.  A Deeper Look at Experience Replay , 2017, ArXiv.

[21]  Razvan Pascanu,et al.  Progressive Neural Networks , 2016, ArXiv.

[22]  Bo An,et al.  Combating Noisy Labels by Agreement: A Joint Training Method with Co-Regularization , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[23]  Jae-Gil Lee,et al.  SELFIE: Refurbishing Unclean Samples for Robust Deep Learning , 2019, ICML.

[24]  Marc'Aurelio Ranzato,et al.  Efficient Lifelong Learning with A-GEM , 2018, ICLR.

[25]  HaiYang Zhang,et al.  DualGraph: A graph-based method for reasoning about label noise , 2021, 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Frank Hutter,et al.  SGDR: Stochastic Gradient Descent with Warm Restarts , 2016, ICLR.

[27]  Thomas Brox,et al.  SELF: Learning to Filter Noisy Labels with Self-Ensembling , 2019, ICLR.

[28]  Kevin Gimpel,et al.  Using Trusted Data to Train Deep Networks on Labels Corrupted by Severe Noise , 2018, NeurIPS.

[29]  R. French Catastrophic forgetting in connectionist networks , 1999, Trends in Cognitive Sciences.

[30]  E. Ricci,et al.  Online Continual Learning under Extreme Memory Constraints , 2020, European Conference on Computer Vision.

[31]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[32]  Kaiming He,et al.  Momentum Contrast for Unsupervised Visual Representation Learning , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[33]  知秀 柴田 5分で分かる!? 有名論文ナナメ読み:Jacob Devlin et al. : BERT : Pre-training of Deep Bidirectional Transformers for Language Understanding , 2020 .

[34]  Lei Zhang,et al.  CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[35]  Nikos Komodakis,et al.  Unsupervised Representation Learning by Predicting Image Rotations , 2018, ICLR.

[36]  Aram Galstyan,et al.  Improving Generalization by Controlling Label-Noise Information in Neural Network Weights , 2020, ICML.

[37]  Yale Song,et al.  Learning from Noisy Labels with Distillation , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[38]  Richard Nock,et al.  Making Deep Neural Networks Robust to Label Noise: A Loss Correction Approach , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  David S. Matteson,et al.  Graph-Based Continual Learning , 2021, ICLR.

[40]  Shih-Fu Chang,et al.  Unsupervised Embedding Learning via Invariant and Spreading Instance Feature , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[41]  Kilian Q. Weinberger,et al.  Identifying Mislabeled Data using the Area Under the Margin Ranking , 2020, NeurIPS.

[42]  Junsoo Ha,et al.  A Neural Dirichlet Process Mixture Model for Task-Free Continual Learning , 2020, ICLR.

[43]  Martha White,et al.  Meta-Learning Representations for Continual Learning , 2019, NeurIPS.

[44]  Pengfei Chen,et al.  Understanding and Utilizing Deep Neural Networks Trained with Noisy Labels , 2019, ICML.

[45]  Alexei A. Efros,et al.  Unsupervised Visual Representation Learning by Context Prediction , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[46]  Junnan Li,et al.  MoPro: Webly Supervised Learning with Momentum Prototypes , 2020, ICLR.

[47]  Derek Hoiem,et al.  Learning without Forgetting , 2016, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[48]  Geoffrey E. Hinton,et al.  A Simple Framework for Contrastive Learning of Visual Representations , 2020, ICML.

[49]  Mert R. Sabuncu,et al.  Generalized Cross Entropy Loss for Training Deep Neural Networks with Noisy Labels , 2018, NeurIPS.

[50]  Douglas Alexandre Gomes Vieira,et al.  CUDA-Based Parallelization of Power Iteration Clustering for Large Datasets , 2017, IEEE Access.

[51]  Rich Caruana,et al.  Multitask Learning , 1998, Encyclopedia of Machine Learning and Data Mining.

[52]  Jiwon Kim,et al.  Continual Learning with Deep Generative Replay , 2017, NIPS.

[53]  W. Hager,et al.  and s , 2019, Shallow Water Hydraulics.

[54]  Sung Ju Hwang,et al.  Lifelong Learning with Dynamically Expandable Networks , 2017, ICLR.

[55]  L R Squire,et al.  Two forms of human amnesia: an analysis of forgetting , 1981, The Journal of neuroscience : the official journal of the Society for Neuroscience.

[56]  Yueming Lyu,et al.  Curriculum Loss: Robust Learning and Generalization against Label Corruption , 2019, ICLR.

[57]  Surya Ganguli,et al.  Continual Learning Through Synaptic Intelligence , 2017, ICML.

[58]  Geoffrey E. Hinton,et al.  Adaptive Mixtures of Local Experts , 1991, Neural Computation.

[59]  Marcus Rohrbach,et al.  Selfless Sequential Learning , 2018, ICLR.

[60]  Dumitru Erhan,et al.  Training Deep Neural Networks on Noisy Labels with Bootstrapping , 2014, ICLR.

[61]  Marc'Aurelio Ranzato,et al.  On Tiny Episodic Memories in Continual Learning , 2019 .

[62]  Yoshua Bengio,et al.  Gradient-based learning applied to document recognition , 1998, Proc. IEEE.

[63]  James Bailey,et al.  Symmetric Cross Entropy for Robust Learning With Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[64]  Weilong Yang,et al.  Beyond Synthetic Noise: Deep Learning on Controlled Noisy Labels , 2019, ICML.

[65]  Dapeng Chen,et al.  Mutual Mean-Teaching: Pseudo Label Refinery for Unsupervised Domain Adaptation on Person Re-identification , 2020, ICLR.

[66]  Gunshi Gupta,et al.  La-MAML: Look-ahead Meta Learning for Continual Learning , 2020, NeurIPS.

[67]  Noel E. O'Connor,et al.  Unsupervised label noise modeling and loss correction , 2019, ICML.

[68]  Yarin Gal,et al.  Towards Robust Evaluations of Continual Learning , 2018, ArXiv.

[69]  Tinne Tuytelaars,et al.  Online Continual Learning with Maximally Interfered Retrieval , 2019, ArXiv.

[70]  Kun Yi,et al.  Probabilistic End-To-End Noise Correction for Learning With Noisy Labels , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[71]  Sebastian Ruder,et al.  Episodic Memory in Lifelong Language Learning , 2019, NeurIPS.

[72]  Junnan Li,et al.  DivideMix: Learning with Noisy Labels as Semi-supervised Learning , 2020, ICLR.

[73]  Paolo Favaro,et al.  Unsupervised Learning of Visual Representations by Solving Jigsaw Puzzles , 2016, ECCV.

[74]  Nathan D. Cahill,et al.  Memory Efficient Experience Replay for Streaming Learning , 2018, 2019 International Conference on Robotics and Automation (ICRA).

[75]  David Filliat,et al.  Marginal Replay vs Conditional Replay for Continual Learning , 2018, ICANN.

[76]  P. Alam ‘N’ , 2021, Composites Engineering: An A–Z Guide.

[77]  Kiyoharu Aizawa,et al.  Joint Optimization Framework for Learning with Noisy Labels , 2018, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[78]  Jacob Goldberger,et al.  Training deep neural-networks using a noise adaptation layer , 2016, ICLR.

[79]  Julien Mairal,et al.  Unsupervised Learning of Visual Features by Contrasting Cluster Assignments , 2020, NeurIPS.

[80]  David Rolnick,et al.  Experience Replay for Continual Learning , 2018, NeurIPS.

[81]  Weihong Deng,et al.  Global-Local GCN: Large-Scale Label Noise Cleansing for Face Recognition , 2020, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[82]  Razvan Pascanu,et al.  Overcoming catastrophic forgetting in neural networks , 2016, Proceedings of the National Academy of Sciences.

[83]  Christian Igel,et al.  Robust Active Label Correction , 2018, AISTATS.

[84]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[85]  Marc'Aurelio Ranzato,et al.  Gradient Episodic Memory for Continual Learning , 2017, NIPS.

[86]  Philip H. S. Torr,et al.  GDumb: A Simple Approach that Questions Our Progress in Continual Learning , 2020, ECCV.

[87]  Gunhee Kim,et al.  Imbalanced Continual Learning with Partitioning Reservoir Sampling , 2020, ECCV.

[88]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[89]  James Bailey,et al.  Dimensionality-Driven Learning with Noisy Labels , 2018, ICML.

[90]  Xiaogang Wang,et al.  Deep Self-Learning From Noisy Labels , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[91]  Yanyao Shen,et al.  Learning with Bad Training Data via Iterative Trimmed Loss Minimization , 2018, ICML.

[92]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[93]  Sergey I. Nikolenko,et al.  Label Denoising with Large Ensembles of Heterogeneous Neural Networks , 2018, ECCV Workshops.

[94]  G. Sabidussi The centrality of a graph. , 1966, Psychometrika.

[95]  Mohan S. Kankanhalli,et al.  Learning to Learn From Noisy Labeled Data , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[96]  Andreas S. Tolias,et al.  Three scenarios for continual learning , 2019, ArXiv.

[97]  Alexei A. Efros,et al.  Context Encoders: Feature Learning by Inpainting , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[98]  Yoshua Bengio,et al.  A Closer Look at Memorization in Deep Networks , 2017, ICML.

[99]  Jeffrey Scott Vitter,et al.  Random sampling with a reservoir , 1985, TOMS.

[100]  Rahaf Aljundi,et al.  Continual Learning in Neural Networks , 2019, ArXiv.

[101]  R Ratcliff,et al.  Connectionist models of recognition memory: constraints imposed by learning and forgetting functions. , 1990, Psychological review.

[102]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.