No Cost Likelihood Manipulation at Test Time for Making Better Mistakes in Deep Networks

There has been increasing interest in building deep hierarchy-aware classifiers that aim to quantify and reduce the severity of mistakes, and not just reduce the number of errors. The idea is to exploit the label hierarchy (e.g., the WordNet ontology) and consider graph distances as a proxy for mistake severity. Surprisingly, on examining mistake-severity distributions of the top-1 prediction, we find that current state-of-the-art hierarchy-aware deep classifiers do not always show practical improvement over the standard cross-entropy baseline in making better mistakes. The reason for the reduction in average mistake-severity can be attributed to the increase in low-severity mistakes, which may also explain the noticeable drop in their accuracy. To this end, we use the classical Conditional Risk Minimization (CRM) framework for hierarchy-aware classification. Given a cost matrix and a reliable estimate of likelihoods (obtained from a trained network), CRM simply amends mistakes at inference time; it needs no extra hyperparameters and requires adding just a few lines of code to the standard cross-entropy baseline. It significantly outperforms the state-of-the-art and consistently obtains large reductions in the average hierarchical distance of top-$k$ predictions across datasets, with very little loss in accuracy. CRM, because of its simplicity, can be used with any off-the-shelf trained model that provides reliable likelihood estimates.

[1]  Joachim Denzler,et al.  Hierarchy-Based Image Embeddings for Semantic Image Retrieval , 2018, 2019 IEEE Winter Conference on Applications of Computer Vision (WACV).

[2]  Bianca Zadrozny,et al.  Transforming classifier scores into accurate multiclass probability estimates , 2002, KDD.

[3]  Vinod Nair,et al.  Learning hierarchical similarity metrics , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  Bohyung Han,et al.  Learning for Single-Shot Confidence Calibration in Deep Neural Networks Through Stochastic Inferences , 2018, 2019 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[5]  Bernt Schiele,et al.  Latent Embeddings for Zero-Shot Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  John Langford,et al.  An iterative method for multi-class cost-sensitive learning , 2004, KDD.

[7]  Michele Merler,et al.  Learning to Make Better Mistakes: Semantics-aware Visual Food Recognition , 2016, ACM Multimedia.

[8]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[9]  Joachim Denzler,et al.  Integrating domain knowledge: using hierarchies to improve deep classifiers , 2018, ACPR.

[10]  Hsuan-Tien Lin,et al.  A simple methodology for soft cost-sensitive classification , 2012, KDD.

[11]  Rich Caruana,et al.  Predicting good probabilities with supervised learning , 2005, ICML.

[12]  Bianca Zadrozny,et al.  Obtaining calibrated probability estimates from decision trees and naive Bayesian classifiers , 2001, ICML.

[13]  Pedro M. Domingos MetaCost: a general method for making classifiers cost-sensitive , 1999, KDD '99.

[14]  Alex A. Freitas,et al.  A survey of hierarchical classification across different application domains , 2010, Data Mining and Knowledge Discovery.

[15]  Xiaoming Liu,et al.  Do Convolutional Neural Networks Learn Class Hierarchy? , 2017, IEEE Transactions on Visualization and Computer Graphics.

[16]  Kilian Q. Weinberger,et al.  On Calibration of Modern Neural Networks , 2017, ICML.

[17]  Milos Hauskrecht,et al.  Obtaining Well Calibrated Probabilities Using Bayesian Binning , 2015, AAAI.

[18]  Ali Farhadi,et al.  YOLO9000: Better, Faster, Stronger , 2016, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[19]  Michael S. Bernstein,et al.  ImageNet Large Scale Visual Recognition Challenge , 2014, International Journal of Computer Vision.

[20]  Sunil Vadera,et al.  A survey of cost-sensitive decision tree induction algorithms , 2013, CSUR.

[21]  Bianca Zadrozny,et al.  Learning and making decisions when costs and probabilities are both unknown , 2001, KDD '01.

[22]  Philip H.S. Torr,et al.  Calibrating Deep Neural Networks using Focal Loss , 2020, NeurIPS.

[23]  Zhi-Hua Zhou,et al.  ON MULTI‐CLASS COST‐SENSITIVE LEARNING , 2006, Comput. Intell..

[24]  Luca Bertinetto,et al.  Making Better Mistakes: Leveraging Class Hierarchies With Deep Networks , 2019, 2020 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR).

[25]  Marc'Aurelio Ranzato,et al.  DeViSE: A Deep Visual-Semantic Embedding Model , 2013, NIPS.

[26]  Eric P. Xing,et al.  Large-Scale Category Structure Aware Image Categorization , 2011, NIPS.

[27]  Hsuan-Tien Lin,et al.  One-sided Support Vector Regression for Multiclass Cost-sensitive Classification , 2010, ICML.

[28]  Bernt Schiele,et al.  Evaluation of output embeddings for fine-grained image classification , 2014, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[29]  Charles Elkan,et al.  The Foundations of Cost-Sensitive Learning , 2001, IJCAI.