Hierarchical Class-Based Curriculum Loss

Classification algorithms in machine learning often assume a flat label space. However, most real world data have dependencies between the labels, which can often be captured by using a hierarchy. Utilizing this relation can help develop a model capable of satisfying the dependencies and improving model accuracy and interpretability. Further, as different levels in the hierarchy correspond to different granularities, penalizing each label equally can be detrimental to model learning. In this paper, we propose a loss function, hierarchical curriculum loss, with two properties: (i) satisfy hierarchical constraints present in the label space, and (ii) provide non-uniform weights to labels based on their levels in the hierarchy, learned implicitly by the training paradigm. We theoretically show that the proposed loss function is a tighter bound of 0-1 loss compared to any other loss satisfying the hierarchical constraints. We test our loss function on real world image data sets, and show that it significantly substantially outperforms multiple baselines.

[1]  Michele Merler,et al.  Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition , 2016, ACM Multimedia.

[2]  André Carlos Ponce de Leon Ferreira de Carvalho,et al.  Hierarchical multi-label classification using local neural networks , 2014, J. Comput. Syst. Sci..

[3]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[4]  Gang Niu,et al.  Does Distributionally Robust Supervised Learning Give Robust Classifiers? , 2016, ICML.

[5]  Noriko Kando,et al.  Proceedings of the 40th International ACM SIGIR Conference on Research and Development in Information Retrieval , 2017, SIGIR.

[6]  Leslie N. Smith,et al.  Cyclical Learning Rates for Training Neural Networks , 2015, 2017 IEEE Winter Conference on Applications of Computer Vision (WACV).

[7]  Larry P. Heck,et al.  Efficient Incremental Learning for Mobile Object Detection , 2019, ArXiv.

[8]  Larry P. Heck,et al.  Generative Visual Dialogue System via Adaptive Reasoning and Weighted Likelihood Estimation , 2019, ArXiv.

[9]  Taro Miyazaki,et al.  Label Embedding using Hierarchical Structure of Labels for Twitter Classification , 2019, EMNLP/IJCNLP.

[10]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[11]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[12]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  Federica Mandreoli,et al.  Journal of Computer and System Sciences Special Issue on Query Answering on Graph-Structured Data , 2016, Journal of computer and system sciences (Print).

[14]  Hongxia Jin,et al.  Taking a HINT: Leveraging Explanations to Make Vision and Language Models More Grounded , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[15]  Larry P. Heck,et al.  Class-incremental Learning via Deep Model Consolidation , 2019, 2020 IEEE Winter Conference on Applications of Computer Vision (WACV).

[16]  John Schulman,et al.  Teacher–Student Curriculum Learning , 2017, IEEE Transactions on Neural Networks and Learning Systems.

[17]  Rodrigo C. Barros,et al.  Hierarchical Multi-Label Classification Networks , 2018, ICML.

[18]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[19]  Saso Dzeroski,et al.  Hierarchical classification of diatom images using ensembles of predictive clustering trees , 2012, Ecol. Informatics.

[20]  Jason Weston,et al.  Curriculum learning , 2009, ICML '09.

[21]  Guigang Zhang,et al.  Deep Learning , 2016, Int. J. Semantic Comput..

[22]  Enrico Blanzieri,et al.  AWX: An Integrated Approach to Hierarchical-Multilabel Classification , 2018, ECML/PKDD.

[23]  Tat-Seng Chua,et al.  Neural Factorization Machines for Sparse Predictive Analytics , 2017, SIGIR.

[24]  Anna Korhonen,et al.  Initializing neural networks for hierarchical multi-label text classification , 2017, BioNLP.

[25]  Arjan Durresi,et al.  A survey: Control plane scalability issues and approaches in Software-Defined Networking (SDN) , 2017, Comput. Networks.

[26]  Nicholay Topin,et al.  Super-convergence: very fast training of neural networks using large learning rates , 2018, Defense + Commercial Sensing.

[27]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[28]  Larry P. Heck,et al.  Contextual LSTM (CLSTM) models for Large scale NLP tasks , 2016, ArXiv.

[29]  Harris Wu,et al.  Evaluating Web-based Question Answering Systems , 2002, LREC.

[30]  Weiwei Liu,et al.  Two-Stage Label Embedding via Neural Factorization Machine for Multi-Label Classification , 2019, AAAI.

[31]  Tie-Yan Liu,et al.  Ranking Measures and Loss Functions in Learning to Rank , 2009, NIPS.

[32]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[33]  Luca Bertinetto,et al.  Making Better Mistakes: Leveraging Class Hierarchies With Deep Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Hsuan-Tien Lin,et al.  Cost-sensitive label embedding for multi-label classification , 2017, Machine Learning.

[35]  Saso Dzeroski,et al.  Hierarchical annotation of medical images , 2011, Pattern Recognit..

[36]  Yang Song,et al.  The iNaturalist Species Classification and Detection Dataset , 2017, 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition.

[37]  Vinod Nair,et al.  Learning hierarchical similarity metrics , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[38]  T. Yalcinoz,et al.  Implementing soft computing techniques to solve economic dispatch problem in power systems , 2008, Expert Syst. Appl..

[39]  Yueming Lyu,et al.  Curriculum Loss: Robust Learning and Generalization against Label Corruption , 2019, ICLR.

[40]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[41]  Palash Goyal,et al.  Graph Embedding Techniques, Applications, and Performance: A Survey , 2017, Knowl. Based Syst..

[42]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[43]  Arun K. Pujari,et al.  Multi-label classification using hierarchical embedding , 2018, Expert Syst. Appl..

[44]  Thomas G. Dietterich What is machine learning? , 2020, Archives of Disease in Childhood.