Bayes Risk for Large Scale Hierarchical Top-K Image Classification

Despite numerous efforts and recent progress, image classification remains a challenging problem, where computers are still outperformed by humans. In particular, the recent trend of large-scale image classification (thousands of images, hundreds of classes, high dimensional features), made popular by the ImageNet dataset [13] has recently received growing interest. Yet, the standard evaluation protocol, which reports only the misclassification rate, fails to produce well-behaved classifiers. The inherent difficulty of large-scale datasets, causes human beings to sometimes fail at the classification task (the three classes of ImageNet “softball”, “hardball” and “professional baseball” are for instance almost indistinguishable). Even when failing, humans nevertheless always predict an output semantically similar to the correct one. A hierarchy between concepts was therefore introduced to define a inter-class distance, to penalize classifiers outputing farfetched labels, as for instance in the 2010 and 2011 editions of the ImageNet Large-Scale Visual Recognition Challenge (ILSVRC). Another recent trend in classification protocols is to allow classifiers to output several guesses, only taking the best one into account. Also in use in ILSVRC, this leniency allows for datasets imperfections and ambiguous images. The purpose of this master thesis is twofold. In a first part, we introduce Minimum Bayes Risk prediction to solve the problem of large-scale hierarchical top-K classification. Using an approximation of a submodular score and posterior class-probabilities given by a Logistic Regression, we get significant improvements over the naive prediction. In a second part, we report a preliminary work on improving the determination of the posterior probabilities with a new classifier called the Bayes Risk Machine. We report good improvements on top-1.

[1]  Alan L. Yuille,et al.  The Concave-Convex Procedure , 2003, Neural Computation.

[2]  J. Nocedal,et al.  A Limited Memory Algorithm for Bound Constrained Optimization , 1995, SIAM J. Sci. Comput..

[3]  Antoine Cornuéjols,et al.  Apprentissage artificiel - Concepts et algorithmes , 2003 .

[4]  G. Griffin,et al.  Caltech-256 Object Category Dataset , 2007 .

[5]  M. L. Fisher,et al.  An analysis of approximations for maximizing submodular set functions—I , 1978, Math. Program..

[6]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[7]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[8]  Pietro Perona,et al.  Learning Generative Visual Models from Few Training Examples: An Incremental Bayesian Approach Tested on 101 Object Categories , 2004, 2004 Conference on Computer Vision and Pattern Recognition Workshop.

[9]  Thomas Hofmann,et al.  Hierarchical document categorization with support vector machines , 2004, CIKM '04.

[10]  A. Klaser,et al.  Human Detection and Action Recognition in Video Sequences - Human Character Recognition in TV-Style Movies , 2006 .

[11]  Christoph H. Lampert,et al.  Beyond sliding windows: Object localization by efficient subwindow search , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[12]  William H. Press,et al.  Numerical Recipes 3rd Edition: The Art of Scientific Computing , 2007 .

[13]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[14]  Charles Kemp,et al.  How to Grow a Mind: Statistics, Structure, and Abstraction , 2011, Science.

[15]  Christopher M. Bishop,et al.  Pattern Recognition and Machine Learning (Information Science and Statistics) , 2006 .

[16]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[17]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[18]  Michel Minoux,et al.  Accelerated greedy algorithms for maximizing submodular set functions , 1978 .

[19]  Juho Rousu,et al.  Kernel-Based Learning of Hierarchical Multilabel Classification Models , 2006, J. Mach. Learn. Res..

[20]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[21]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Pushmeet Kohli,et al.  Multiple Choice Learning: Learning to Produce Multiple Structured Outputs , 2012, NIPS.

[23]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[24]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[25]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[26]  Cordelia Schmid,et al.  A Performance Evaluation of Local Descriptors , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[27]  Yoshua. Bengio,et al.  Learning Deep Architectures for AI , 2007, Found. Trends Mach. Learn..

[28]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[29]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[30]  Thorsten Joachims,et al.  Learning structural SVMs with latent variables , 2009, ICML '09.