A Comprehensive Survey on Curriculum Learning

Curriculum learning (CL) is a training strategy that trains a machine learning model from easier data to harder data, which imitates the meaningful learning order in human curricula. As an easy-to-use plug-in tool, the CL strategy has demonstrated its power in improving the generalization capacity and convergence rate of various models in a wide range of scenarios such as computer vision and natural language processing, etc. In this survey article, we comprehensively review CL from various aspects including motivations, definitions, theories, and applications. We discuss works on curriculum learning within a general CL framework, elaborating on how to design a manually predefined curriculum or an automatic curriculum. In particular, we summarize existing CL designs based on the general framework of Difficulty Measurer + Training Scheduler and further categorize the methodologies for automatic CL into four groups, i.e., Self-paced Learning, Transfer Teacher, RL Teacher, and Other Automatic CL. Finally, we present brief discussions on the relationships between CL and other methods, and point out potential future research directions deserving further investigations.

[1]  Yoshua Bengio,et al.  Variance Reduction in SGD by Distributed Importance Sampling , 2015, ArXiv.

[2]  Jürgen Schmidhuber,et al.  Curious model-building control systems , 1991, [Proceedings] 1991 IEEE International Joint Conference on Neural Networks.

[3]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[4]  Nanning Zheng,et al.  Deep self-paced learning for person re-identification , 2017, Pattern Recognit..

[5]  George F. Foster,et al.  Reinforcement Learning based Curriculum Optimization for Neural Machine Translation , 2019, NAACL.

[6]  Dennis DeCoste,et al.  Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum , 2019, NeurIPS.

[7]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[8]  Samy Bengio,et al.  Scheduled Sampling for Sequence Prediction with Recurrent Neural Networks , 2015, NIPS.

[9]  Alex Graves,et al.  Automated Curriculum Learning for Neural Networks , 2017, ICML.

[10]  Qi Xie,et al.  Self-Paced Co-training , 2017, ICML.

[11]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[12]  Qi Xie,et al.  Self-Paced Learning for Matrix Factorization , 2015, AAAI.

[13]  Barnabás Póczos,et al.  Competence-based Curriculum Learning for Neural Machine Translation , 2019, NAACL.

[14]  Yang Gao,et al.  Self-paced dictionary learning for image classification , 2012, ACM Multimedia.

[15]  Yong Jae Lee,et al.  Learning the easy things first: Self-paced visual category discovery , 2011, CVPR 2011.

[16]  Mingkui Tan,et al.  Breaking the Curse of Space Explosion: Towards Efficient NAS with Curriculum Search , 2020, ICML.

[17]  Yulia Tsvetkov,et al.  Learning the Curriculum with Bayesian Optimization for Task-Specific Word Representation Learning , 2016, ACL.

[18]  Weilin Huang,et al.  CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images , 2018, ECCV.

[19]  Ioannis A. Kakadiaris,et al.  Curriculum Learning for Multi-task Classification of Visual Attributes , 2017, 2017 IEEE International Conference on Computer Vision Workshops (ICCVW).

[20]  Maoguo Gong,et al.  Self-paced Convolutional Neural Networks , 2017, IJCAI.

[21]  Aaron S Benjamin,et al.  On the effectiveness of self-paced learning. , 2011, Journal of memory and language.

[22]  D. Weinshall,et al.  Curriculum Learning by Transfer Learning: Theory and Experiments with Deep Networks , 2018, ICML.

[23]  Deyu Meng,et al.  Meta self-paced learning , 2020 .

[24]  Shih-Chii Liu,et al.  A curriculum learning method for improved noise robustness in automatic speech recognition , 2016, 2017 25th European Signal Processing Conference (EUSIPCO).

[25]  Cheng Deng,et al.  Balanced Self-Paced Learning for Generative Adversarial Clustering Network , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[26]  Deyu Meng,et al.  Easy Samples First: Self-paced Reranking for Zero-Example Multimedia Search , 2014, ACM Multimedia.

[27]  Pierre-Yves Oudeyer,et al.  Automatic Curriculum Learning For Deep RL: A Short Survey , 2020, IJCAI.

[28]  Sheng-Jun Huang,et al.  Self-Paced Active Learning: Query the Right Thing at the Right Time , 2019, AAAI.

[29]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[30]  Xinlei Chen,et al.  Webly Supervised Learning of Convolutional Networks , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[31]  Wei Liu,et al.  Multi-Modal Curriculum Learning for Semi-Supervised Image Classification , 2016, IEEE Transactions on Image Processing.

[32]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[33]  Nicolas Guizard,et al.  CASED: Curriculum Adaptive Sampling for Extreme Data Imbalance , 2017, MICCAI.

[34]  Dong Xu,et al.  SPFTN: A Self-Paced Fine-Tuning Network for Segmenting Objects in Weakly Labelled Videos , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[35]  Bin Yang,et al.  Learning to Reweight Examples for Robust Deep Learning , 2018, ICML.

[36]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[37]  Marius Leordeanu,et al.  Image Difficulty Curriculum for Generative Adversarial Networks (CuGAN) , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[38]  Abhinav Gupta,et al.  Training Region-Based Object Detectors with Online Hard Example Mining , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[39]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[40]  Deyu Meng,et al.  Understanding Self-Paced Learning under Concave Conjugacy Theory , 2018, Commun. Inf. Syst..

[41]  Douglas L. T. Rohde,et al.  Language acquisition in the absence of explicit negative evidence: how important is starting small? , 1999, Cognition.

[42]  Yongdong Zhang,et al.  Curriculum Learning for Natural Language Understanding , 2020, ACL.

[43]  Joaquin Vanschoren,et al.  Meta-Learning: A Survey , 2018, Automated Machine Learning.

[44]  Deyu Meng,et al.  Learning to Detect Concepts from Webly-Labeled Video Data , 2016, IJCAI.

[45]  Maoguo Gong,et al.  Multi-Objective Self-Paced Learning , 2016, AAAI.

[46]  Jun Zhao,et al.  Curriculum Learning for Natural Answer Generation , 2018, IJCAI.

[47]  Siddharth Gopal,et al.  Adaptive Sampling for SGD by Exploiting Side Information , 2016, ICML.

[48]  Siu Cheung Hui,et al.  Simple and Effective Curriculum Pointer-Generator Networks for Reading Comprehension over Long Narratives , 2019, ACL.

[49]  Jivko Sinapov,et al.  Curriculum Learning for Reinforcement Learning Domains: A Framework and Survey , 2020, J. Mach. Learn. Res..

[50]  Ciprian Chelba,et al.  Dynamically Composing Domain-Data Selection with Clean-Data Selection by “Co-Curricular Learning” for Neural Machine Translation , 2019, ACL.

[51]  Wei Wu,et al.  Dynamic Curriculum Learning for Imbalanced Data Classification , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[52]  Daphne Koller,et al.  Learning specific-class segmentation from diverse data , 2011, 2011 International Conference on Computer Vision.

[53]  Jiawei Han,et al.  Curriculum Learning for Heterogeneous Star Network Embedding via Deep Reinforcement Learning , 2018, WSDM.

[54]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[55]  Jianmin Wang,et al.  Transferable Curriculum for Weakly-Supervised Domain Adaptation , 2019, AAAI.

[56]  Yoav Freund,et al.  Experiments with a New Boosting Algorithm , 1996, ICML.

[57]  Pieter Abbeel,et al.  Automatic Goal Generation for Reinforcement Learning Agents , 2017, ICML.

[58]  Andrew McCallum,et al.  Active Bias: Training More Accurate Neural Networks by Emphasizing High Variance Samples , 2017, NIPS.

[59]  Fei-Fei Li,et al.  Shifting Weights: Adapting Object Detectors from Image to Video , 2012, NIPS.

[60]  René Vidal,et al.  Curriculum Dropout , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[61]  Yueting Zhuang,et al.  Self-Paced Boost Learning for Classification , 2016, IJCAI.

[62]  Dong Cao,et al.  Self-Paced Cross-Modal Subspace Matching , 2016, SIGIR.

[63]  Dacheng Tao,et al.  Multi-view Self-Paced Learning for Clustering , 2015, IJCAI.

[64]  Yoshua Bengio,et al.  Evolving Culture Versus Local Minima , 2014, Growing Adaptive Machines.

[65]  Lidia S. Chao,et al.  Uncertainty-Aware Curriculum Learning for Neural Machine Translation , 2020, ACL.

[66]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[67]  Kevin Duh,et al.  Curriculum Learning for Domain Adaptation in Neural Machine Translation , 2019, NAACL.

[68]  Lidia S. Chao,et al.  Norm-Based Curriculum Learning for Neural Machine Translation , 2020, ACL.

[69]  Dim P. Papadopoulos,et al.  How Hard Can It Be? Estimating the Difficulty of Visual Search in an Image , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[70]  Tristan Deleu,et al.  Curriculum in Gradient-Based Meta-Reinforcement Learning , 2020, ArXiv.

[71]  Huda Khayrallah,et al.  An Empirical Exploration of Curriculum Learning for Neural Machine Translation , 2018, ArXiv.

[72]  Wojciech Zaremba,et al.  Learning to Execute , 2014, ArXiv.

[73]  Shiguang Shan,et al.  Self-Paced Learning with Diversity , 2014, NIPS.

[74]  Jeff A. Bilmes,et al.  Minimax Curriculum Learning: Machine Teaching with Desirable Difficulties and Scheduled Diversity , 2018, ICLR.

[75]  Yunchao Wei,et al.  STC: A Simple to Complex Framework for Weakly-Supervised Semantic Segmentation , 2015, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[76]  Zenglin Xu,et al.  Robust Softmax Regression for Multi-class Classification with Self-Paced Learning , 2017, IJCAI.

[77]  Lei Zhang,et al.  Active Self-Paced Learning for Cost-Effective and Progressive Face Identification , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[78]  Deyu Meng,et al.  Why curriculum learning & self-paced learning work in big/noisy data: A theoretical perspective , 2015 .

[79]  Jonghyun Choi,et al.  ScreenerNet: Learning Self-Paced Curriculum for Deep Neural Networks , 2018 .

[80]  Changsheng Li,et al.  Self-Paced Multi-Task Learning , 2016, AAAI.

[81]  Lijun Wu,et al.  Learning to Teach with Dynamic Loss Functions , 2018, NeurIPS.

[82]  Richard S. Sutton,et al.  Training and Tracking in Robotics , 1985, IJCAI.

[83]  Kai A. Krueger,et al.  Flexible shaping: How learning in small steps helps , 2009, Cognition.

[84]  Changick Kim,et al.  Pseudo-Labeling Curriculum for Unsupervised Domain Adaptation , 2019, BMVC.

[85]  Peter Watkinson,et al.  Student-Teacher Curriculum Learning via Reinforcement Learning: Predicting Hospital Inpatient Admission Location , 2020, ICML.

[86]  Ondrej Bojar,et al.  Curriculum Learning and Minibatch Bucketing in Neural Machine Translation , 2017, RANLP.

[87]  Louis-Philippe Morency,et al.  Visualizing and Understanding Curriculum Learning for Long Short-Term Memory Networks , 2016, ArXiv.

[88]  Xiaofeng Zhu,et al.  Unsupervised feature selection by self-paced learning regularization , 2020, Pattern Recognit. Lett..

[89]  John H. L. Hansen,et al.  Curriculum Learning Based Approaches for Noise Robust Speaker Recognition , 2018, IEEE/ACM Transactions on Audio, Speech, and Language Processing.

[90]  Yuxing Tang,et al.  Attention-Guided Curriculum Learning for Weakly Supervised Classification and Localization of Thoracic Diseases on Chest Radiographs , 2018, MLMI@MICCAI.

[91]  Eugene L. Allgower,et al.  Numerical continuation methods - an introduction , 1990, Springer series in computational mathematics.

[92]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[93]  Xiaoli Wang,et al.  Reinforced Curriculum Learning on Pre-trained Neural Machine Translation Models , 2020, AAAI.

[94]  J. Stenton,et al.  Learning how to teach. , 1973, Nursing mirror and midwives journal.

[95]  Louis-Philippe Morency,et al.  Curriculum Learning for Facial Expression Recognition , 2017, 2017 12th IEEE International Conference on Automatic Face & Gesture Recognition (FG 2017).

[96]  Chao Li,et al.  A Self-Paced Multiple-Instance Learning Framework for Co-Saliency Detection , 2015, 2015 IEEE International Conference on Computer Vision (ICCV).

[97]  Daphna Weinshall,et al.  On The Power of Curriculum Learning in Training Deep Networks , 2019, ICML.

[98]  Deyu Meng,et al.  A theoretical understanding of self-paced learning , 2017, Inf. Sci..

[99]  Deyu Meng,et al.  Leveraging Prior-Knowledge for Weakly Supervised Object Detection Under a Collaborative Self-Paced Curriculum Learning Framework , 2018, International Journal of Computer Vision.

[100]  Raffaele Perego,et al.  Continuation Methods and Curriculum Learning for Learning to Rank , 2018, CIKM.

[101]  Jinhua Du,et al.  Self-Attention Enhanced CNNs and Collaborative Curriculum Learning for Distantly Supervised Relation Extraction , 2019, EMNLP.

[102]  Frank Hutter,et al.  Online Batch Selection for Faster Training of Neural Networks , 2015, ArXiv.

[103]  Maoguo Gong,et al.  Decomposition-Based Evolutionary Multiobjective Optimization to Self-Paced Learning , 2019, IEEE Transactions on Evolutionary Computation.

[104]  Ye Tian,et al.  Learning a Multi-Domain Curriculum for Neural Machine Translation , 2020, ACL.

[105]  Pan He,et al.  Adversarial Examples: Attacks and Defenses for Deep Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[106]  Wei Zheng,et al.  Self-paced Learning for K-means Clustering Algorithm , 2020, Pattern Recognit. Lett..

[107]  Ralph T. Putnam,et al.  Learning to teach. , 1996 .

[108]  Terence D. Sanger,et al.  Neural network learning control of robot manipulators using gradually increasing task difficulty , 1994, IEEE Trans. Robotics Autom..

[109]  Ran He,et al.  Self-Paced Learning: An Implicit Regularization Perspective , 2016, AAAI.

[110]  Nassir Navab,et al.  Medical-based Deep Curriculum Learning for Improved Fracture Classification , 2019, MICCAI.

[111]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[112]  Tinne Tuytelaars,et al.  A Continual Learning Survey: Defying Forgetting in Classification Tasks , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[113]  William D. Lewis,et al.  Intelligent Selection of Language Model Training Data , 2010, ACL.

[114]  Claudia Hauff,et al.  Curriculum Learning Strategies for IR , 2019, ECIR.

[115]  Vanya Avramova Curriculum Learning with Deep Convolutional Neural Networks , 2015 .

[116]  Qingshan Liu,et al.  A Self-Paced Regularization Framework for Multilabel Learning , 2016, IEEE Transactions on Neural Networks and Learning Systems.

[117]  Zhenglu Yang,et al.  Curriculum Pre-training for End-to-End Speech Translation , 2020, ACL.

[118]  Deyu Meng,et al.  On Convergence Property of Implicit Self-paced Objective , 2017, ArXiv.