Asymmetric Loss For Multi-Label Classification

Pictures of everyday life are inherently multi-label in nature. Hence, multi-label classification is commonly used to analyze their content. In typical multi-label datasets, each picture contains only a few positive labels, and many negative ones. This positive-negative imbalance can result in under-emphasizing gradients from positive labels during training, leading to poor accuracy. In this paper, we introduce a novel asymmetric loss ("ASL"), that operates differently on positive and negative samples. The loss dynamically down-weights the importance of easy negative samples, causing the optimization process to focus more on the positive samples, and also enables to discard mislabeled negative samples. We demonstrate how ASL leads to a more "balanced" network, with increased average probabilities for positive samples, and show how this balanced network is translated to better mAP scores, compared to commonly used losses. Furthermore, we offer a method that can dynamically adjust the level of asymmetry throughout the training. With ASL, we reach new state-of-the-art results on three common multi-label datasets, including achieving 86.6% on MS-COCO. We also demonstrate ASL applicability for other tasks such as fine-grain single-label classification and object detection. ASL is effective, easy to implement, and does not increase the training time or complexity. Implementation is available at: this https URL.

[1]  Chuang Gan,et al.  Once for All: Train One Network and Specialize it for Efficient Deployment , 2019, ICLR.

[2]  Luc Van Gool,et al.  The 2005 PASCAL Visual Object Classes Challenge , 2005, MLCW.

[3]  Xiu-Shen Wei,et al.  Multi-Label Image Recognition with Joint Class-Aware Map Disentangling and Label Correlation Embedding , 2019, 2019 IEEE International Conference on Multimedia and Expo (ICME).

[4]  Quoc V. Le,et al.  AutoAugment: Learning Augmentation Strategies From Data , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Quoc V. Le,et al.  Searching for MobileNetV3 , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[6]  Hongyuan Zha,et al.  Deep Extreme Multi-label Learning , 2017, ICMR.

[7]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[8]  Sid Ying-Ze Bao,et al.  Cross-Modality Attention with Semantic Graph Embedding for Multi-Label Classification , 2020, AAAI.

[9]  Yu Zhang,et al.  Exploit Bounding Box Annotations for Multi-Label Object Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[10]  Shiming Xiang,et al.  Multi-Label Image Classification via Knowledge Distillation from Weakly-Supervised Detection , 2018, ACM Multimedia.

[11]  Jianguo Zhang,et al.  The PASCAL Visual Object Classes Challenge , 2006 .

[12]  Graham W. Taylor,et al.  Improved Regularization of Convolutional Neural Networks with Cutout , 2017, ArXiv.

[13]  Kaiming He,et al.  Focal Loss for Dense Object Detection , 2017, 2017 IEEE International Conference on Computer Vision (ICCV).

[14]  Luc Van Gool,et al.  The Pascal Visual Object Classes Challenge: A Retrospective , 2014, International Journal of Computer Vision.

[15]  Frank Hutter,et al.  Decoupled Weight Decay Regularization , 2017, ICLR.

[16]  Ross B. Girshick,et al.  Focal Loss for Dense Object Detection , 2017, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[17]  Hao Chen,et al.  FCOS: Fully Convolutional One-Stage Object Detection , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[18]  Greg Mori,et al.  Learning a Deep ConvNet for Multi-Label Classification With Partial Labels , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Quoc V. Le,et al.  EfficientNet: Rethinking Model Scaling for Convolutional Neural Networks , 2019, ICML.

[20]  Tat-Seng Chua,et al.  NUS-WIDE: a real-world web image database from National University of Singapore , 2009, CIVR '09.

[21]  Qian Wang,et al.  A Baseline for Multi-Label Image Classification Using an Ensemble of Deep Convolutional Neural Networks , 2019, 2019 IEEE International Conference on Image Processing (ICIP).

[22]  Kai Chen,et al.  MMDetection: Open MMLab Detection Toolbox and Benchmark , 2019, ArXiv.

[23]  Hefeng Wu,et al.  Learning Semantic-Specific Graph Representation for Multi-Label Image Recognition , 2019, 2019 IEEE/CVF International Conference on Computer Vision (ICCV).

[24]  Jian Sun,et al.  Deep Residual Learning for Image Recognition , 2015, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Pietro Perona,et al.  Microsoft COCO: Common Objects in Context , 2014, ECCV.

[26]  Bin-Bin Gao,et al.  Multi-Label Image Recognition with Multi-Class Attentional Regions , 2020, ArXiv.

[27]  Shifeng Zhang,et al.  Bridging the Gap Between Anchor-Based and Anchor-Free Detection via Adaptive Training Sample Selection , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[28]  Yang Song,et al.  Class-Balanced Loss Based on Effective Number of Samples , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Jinwen Ma,et al.  Multi-Label Classification with Label Graph Superimposing , 2019, AAAI.

[30]  Peter L. Bartlett,et al.  Classification with a Reject Option using a Hinge Loss , 2008, J. Mach. Learn. Res..

[31]  Stefano Soatto,et al.  Time Matters in Regularizing Deep Networks: Weight Decay and Data Augmentation Affect Early Learning Dynamics, Matter Little Near Convergence , 2019, NeurIPS.

[32]  Leslie N. Smith,et al.  A disciplined approach to neural network hyper-parameters: Part 1 - learning rate, batch size, momentum, and weight decay , 2018, ArXiv.

[33]  Xiu-Shen Wei,et al.  Multi-Label Image Recognition With Graph Convolutional Networks , 2019, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[34]  Itamar Friedman,et al.  TResNet: High Performance GPU-Dedicated Architecture , 2020, 2021 IEEE Winter Conference on Applications of Computer Vision (WACV).

[35]  Yu Qiao,et al.  Attention-Driven Dynamic Graph Convolutional Network for Multi-label Image Recognition , 2020, ECCV.

[36]  Chen Huang,et al.  Learning Deep Representation for Imbalanced Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[37]  Yonghong Xie,et al.  Bi-Modal Learning With Channel-Wise Attention for Multi-Label Image Classification , 2020, IEEE Access.

[38]  Yu Wang,et al.  Distribution-Balanced Loss for Multi-Label Classification in Long-Tailed Datasets , 2020, ECCV.

[39]  Johannes Fürnkranz,et al.  Maximizing Subset Accuracy with Recurrent Neural Networks in Multi-label Classification , 2017, NIPS.

[40]  Iaira Boissevain,et al.  [For completeness]. , 2011, Tijdschrift voor diergeneeskunde.