Coarse-to-Fine Curriculum Learning

When faced with learning challenging new tasks, humans often follow sequences of steps that allow them to incrementally build up the necessary skills for performing these new tasks. However, in machine learning, models are most often trained to solve the target tasks directly. Inspired by human learning, we propose a novel curriculum learning approach which decomposes challenging tasks into sequences of easier intermediate goals that are used to pre-train a model before tackling the target task. We focus on classification tasks, and design the intermediate tasks using an automatically constructed label hierarchy. We train the model at each level of the hierarchy, from coarse labels to fine labels, transferring acquired knowledge across these levels. For instance, the model will first learn to distinguish animals from objects, and then use this acquired knowledge when learning to classify among more fine-grained classes such as cat, dog, car, and truck. Most existing curriculum learning algorithms for supervised learning consist of scheduling the order in which the training examples are presented to the model. In contrast, our approach focuses on the output space of the model. We evaluate our method on several established datasets and show significant performance gains especially on classification problems with many labels. We also evaluate on a new synthetic dataset which allows us to study multiple aspects of our method.

[1]  Mirjana Bonkovic,et al.  Two-level coarse-to-fine classification algorithm for asthma wheezing recognition in children's respiratory sounds , 2015, Biomed. Signal Process. Control..

[2]  E. Warrington Quarterly Journal of Experimental Psychology the Selective Impairment of Semantic Memory the Selective Impairment of Semantic Memory , 2022 .

[3]  Wei Liang,et al.  A deep Coarse-to-Fine network for head pose estimation from synthetic data , 2019, Pattern Recognit..

[4]  Martin Kampel,et al.  Coarse-to-Fine Correspondence Search for Classifying Ancient Coins , 2012, ACCV Workshops.

[5]  Weilin Huang,et al.  CurriculumNet: Weakly Supervised Learning from Large-Scale Web Images , 2018, ECCV.

[6]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[7]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[8]  Rodrigo C. Barros,et al.  Hierarchical Multi-Label Classification Networks , 2018, ICML.

[9]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[10]  J. Elman Learning and development in neural networks: the importance of starting small , 1993, Cognition.

[11]  Larry S. Davis,et al.  A Coarse-to-Fine Framework for Resource Efficient Video Recognition , 2019, International Journal of Computer Vision.

[12]  Xiaojing Ye,et al.  Coarse-to-fine classification via parametric and nonparametric models for computer-aided diagnosis , 2011, CIKM '11.

[13]  Nikos Komodakis,et al.  Wide Residual Networks , 2016, BMVC.

[14]  Li Fei-Fei,et al.  MentorNet: Learning Data-Driven Curriculum for Very Deep Neural Networks on Corrupted Labels , 2017, ICML.

[15]  Amanda Clare,et al.  Predicting gene function in Saccharomyces cerevisiae , 2003, ECCB.

[16]  F. Keil Semantic and Conceptual Development: An Ontological Perspective , 2014 .

[17]  D. Defays,et al.  An Efficient Algorithm for a Complete Link Method , 1977, Comput. J..

[18]  Jian Yao,et al.  Coarse-to-fine Optimization for Speech Enhancement , 2019, INTERSPEECH.

[19]  Xin Geng,et al.  Hierarchical Classification Based on Label Distribution Learning , 2019, AAAI.

[20]  Sarah Adel Bargal,et al.  NBDT: Neural-Backed Decision Trees , 2020, ArXiv.

[21]  Ya Le,et al.  Tiny ImageNet Visual Recognition Challenge , 2015 .

[22]  Ambuj Tewari,et al.  Convex Calibrated Surrogates for Hierarchical Classification , 2015, ICML.

[23]  Valentin I. Spitkovsky,et al.  From Baby Steps to Leapfrog: How “Less is More” in Unsupervised Dependency Parsing , 2010, NAACL.

[24]  Christoph H. Lampert,et al.  Curriculum learning of multiple tasks , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Mirella Lapata,et al.  Coarse-to-Fine Decoding for Neural Semantic Parsing , 2018, ACL.

[26]  Shiguang Shan,et al.  Self-Paced Curriculum Learning , 2015, AAAI.

[27]  Xiaotong Shen,et al.  On Large Margin Hierarchical Classification With Multiple Paths , 2009, Journal of the American Statistical Association.

[28]  Daphne Koller,et al.  Self-Paced Learning for Latent Variable Models , 2010, NIPS.

[29]  Lin Xiao,et al.  Hierarchical Classification via Orthogonal Transfer , 2011, ICML.

[30]  Chen Wei,et al.  Generalized Coarse-to-Fine Visual Recognition with Progressive Training , 2018 .

[31]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[32]  Jeff A. Bilmes,et al.  Minimax Curriculum Learning: Machine Teaching with Desirable Difficulties and Scheduled Diversity , 2018, ICLR.

[33]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  A genetic algorithm for Hierarchical Multi-Label Classification , 2012, SAC '12.

[34]  Ilya Kostrikov,et al.  Intrinsic Motivation and Automatic Curricula via Asymmetric Self-Play , 2017, ICLR.

[35]  S. Pauen The global-to-basic level shift in infants’ categorical thinking: First evidence from a longitudinal study , 2002 .

[36]  Donald Geman,et al.  Coarse-to-Fine Face Detection , 2004, International Journal of Computer Vision.

[37]  G. Valentini,et al.  Weighted True Path Rule: a multilabel hierarchical algorithm for gene function prediction , 2009 .

[38]  J. Mandler,et al.  Concept formation in infancy , 1993 .

[39]  Hichem Sahbi,et al.  Coarse-to-fine support vector classifiers for face detection , 2002, Object recognition supported by user interaction for service robots.

[40]  Susan Gauch,et al.  Training a hierarchical classifier using inter document relationships , 2009, J. Assoc. Inf. Sci. Technol..

[41]  Peter J. Rousseeuw,et al.  Finding Groups in Data: An Introduction to Cluster Analysis , 1990 .

[42]  Paul N. Bennett,et al.  Refined experts: improving classification in large taxonomies , 2009, SIGIR.

[43]  Luis Enrique Sucar,et al.  Hierarchical multilabel classification based on path evaluation , 2016, Int. J. Approx. Reason..

[44]  Li Fei-Fei,et al.  Dynamic Task Prioritization for Multitask Learning , 2018, ECCV.

[45]  J. Mandler How to build a baby: II. Conceptual primitives. , 1992, Psychological review.

[46]  Pietro Perona,et al.  Probabilistic coarse-to-fine object recognition , 2005 .

[47]  Silvio Lattanzi,et al.  Affinity Clustering: Hierarchical Clustering at Scale , 2017, NIPS.

[48]  James L. McClelland,et al.  The parallel distributed processing approach to semantic cognition , 2003, Nature Reviews Neuroscience.

[49]  Yali Amit,et al.  A coarse-to-fine strategy for multiclass shape detection , 2004, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[50]  Pieter Abbeel,et al.  Reverse Curriculum Generation for Reinforcement Learning , 2017, CoRL.

[51]  Robin Sibson,et al.  SLINK: An Optimally Efficient Algorithm for the Single-Link Cluster Method , 1973, Comput. J..

[52]  Nitish Srivastava,et al.  Discriminative Transfer Learning with Tree-based Priors , 2013, NIPS.

[53]  Peter Stone,et al.  Automatic Curriculum Graph Generation for Reinforcement Learning Agents , 2017, AAAI.

[54]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[55]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[56]  Armen Aghajanyan,et al.  Better Fine-Tuning by Reducing Representational Collapse , 2020, ICLR.

[57]  Luke S. Zettlemoyer,et al.  Higher-Order Coreference Resolution with Coarse-to-Fine Inference , 2018, NAACL.

[58]  Alex Krizhevsky,et al.  Learning Multiple Layers of Features from Tiny Images , 2009 .

[59]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[60]  Dennis DeCoste,et al.  Data Parameters: A New Family of Parameters for Learning a Differentiable Curriculum , 2019, NeurIPS.

[61]  Alex Alves Freitas,et al.  Hierarchical classification of protein function with ensembles of rules and particle swarm optimisation , 2008, Soft Comput..

[62]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[63]  Yasushi Makihara,et al.  Object recognition supported by user interaction for service robots , 2002, Object recognition supported by user interaction for service robots.

[64]  Cheng Wang,et al.  Mancs: A Multi-task Attentional Network with Curriculum Sampling for Person Re-Identification , 2018, ECCV.

[65]  Ioannis A. Kakadiaris,et al.  Curriculum Learning of Visual Attribute Clusters for Multi-Task Classification , 2017, Pattern Recognit..

[66]  J. Mandler Perceptual and Conceptual Processes in Infancy , 2000 .

[67]  ChengXiang Zhai,et al.  Multi-label literature classification based on the Gene Ontology graph , 2008, BMC Bioinformatics.

[68]  Barnabás Póczos,et al.  Competence-based Curriculum Learning for Neural Machine Translation , 2019, NAACL.

[69]  Peter Stone,et al.  Autonomous Task Sequencing for Customized Curriculum Design in Reinforcement Learning , 2017, IJCAI.

[70]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.