Good Practice in Large-Scale Learning for Image Classification

We propose a benchmark of several objective functions for large-scale image classification: we compare the one-vs-rest, multiclass, ranking and weighted average ranking SVMs. Using stochastic gradient descent optimization, we can scale the learning to millions of images and thousands of classes. Our experimental evaluation shows that ranking based algorithms do not outperform a one-vs-rest strategy and that the gap between the different algorithms reduces in case of high-dimensional data. We also show that for one-vs-rest, learning through cross-validation the optimal degree of imbalance between the positive and the negative samples can have a significant impact. Furthermore, early stopping can be used as an effective regularization strategy when training with stochastic gradient algorithms. Following these "good practices", we were able to improve the state-of-the-art on a large subset of 10K classes and 9M of images of lmageNet from 16.7% accuracy to 19.1%.

[1]  Lawrence D. Jackel,et al.  Handwritten Digit Recognition with a Back-Propagation Network , 1989, NIPS.

[2]  Thomas G. Dietterich,et al.  Solving Multiclass Learning Problems via Error-Correcting Output Codes , 1994, J. Artif. Intell. Res..

[3]  Jorma Rissanen,et al.  SLIQ: A Fast Scalable Classifier for Data Mining , 1996, EDBT.

[4]  Jason Weston,et al.  Multi-Class Support Vector Machines , 1998 .

[5]  Thorsten Joachims,et al.  Making large-scale support vector machine learning practical , 1999 .

[6]  Jason Weston,et al.  Support vector machines for multi-class pattern recognition , 1999, ESANN.

[7]  John C. Platt,et al.  Fast training of support vector machines using sequential minimal optimization, advances in kernel methods , 1999 .

[8]  Yoram Singer,et al.  Reducing Multiclass to Binary: A Unifying Approach for Margin Classifiers , 2000, J. Mach. Learn. Res..

[9]  Maliha S. Nash,et al.  Handbook of Parametric and Nonparametric Statistical Procedures , 2001, Technometrics.

[10]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[11]  J. van Leeuwen,et al.  Neural Networks: Tricks of the Trade , 2002, Lecture Notes in Computer Science.

[12]  Thorsten Joachims,et al.  Optimizing search engines using clickthrough data , 2002, KDD.

[13]  Andrew Zisserman,et al.  Video Google: a text retrieval approach to object matching in videos , 2003, Proceedings Ninth IEEE International Conference on Computer Vision.

[14]  Steven Salzberg,et al.  On Comparing Classifiers: Pitfalls to Avoid and a Recommended Approach , 1997, Data Mining and Knowledge Discovery.

[15]  Jennifer G. Dy,et al.  A hierarchical method for multi-class support vector machines , 2004, ICML.

[16]  Salvatore J. Stolfo,et al.  On the Accuracy of Meta-learning for Scalable Data Mining , 2004, Journal of Intelligent Information Systems.

[17]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[18]  David G. Lowe,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004, International Journal of Computer Vision.

[19]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[20]  G. Wahba,et al.  Multicategory Support Vector Machines , Theory , and Application to the Classification of Microarray Data and Satellite Radiance Data , 2003 .

[21]  JOHANNES GEHRKE,et al.  RainForest—A Framework for Fast Decision Tree Construction of Large Datasets , 1998, Data Mining and Knowledge Discovery.

[22]  Gabriela Csurka,et al.  Visual categorization with bags of keypoints , 2002, eccv 2004.

[23]  Philip D. Plowright,et al.  Convexity , 2019, Optimization for Chemical and Biochemical Engineering.

[24]  Y. LeCun,et al.  Learning methods for generic object recognition with invariance to pose and lighting , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[25]  Thomas P. Hayes,et al.  Error limiting reductions between classification tasks , 2005, ICML.

[26]  Ambuj Tewari,et al.  On the Consistency of Multiclass Classification Methods , 2007, J. Mach. Learn. Res..

[27]  John Shawe-Taylor,et al.  Improving "bag-of-keypoints" image categorisation: Generative Models and PDF-Kernels , 2005 .

[28]  Lior Rokach,et al.  Top-down induction of decision trees classifiers - a survey , 2005, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[29]  Thomas Hofmann,et al.  Large Margin Methods for Structured and Interdependent Output Variables , 2005, J. Mach. Learn. Res..

[30]  Eric Horvitz,et al.  Considering Cost Asymmetry in Learning Classifiers , 2006, J. Mach. Learn. Res..

[31]  Michael I. Jordan,et al.  Convexity, Classification, and Risk Bounds , 2006 .

[32]  Xuelong Li,et al.  Asymmetric bagging and random subspace for support vector machines-based relevance feedback in image retrieval , 2006, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[33]  Thorsten Joachims,et al.  Training linear SVMs in linear time , 2006, KDD '06.

[34]  Cordelia Schmid,et al.  Beyond Bags of Features: Spatial Pyramid Matching for Recognizing Natural Scene Categories , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[35]  Samy Bengio,et al.  A Discriminative Approach for the Retrieval of Images from Text Queries , 2006, ECML.

[36]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[37]  Filip Radlinski,et al.  A support vector method for optimizing average precision , 2007, SIGIR.

[38]  Jason Weston,et al.  Solving multiclass support vector machines with LaRank , 2007, ICML '07.

[39]  Florent Perronnin,et al.  Fisher Kernels on Visual Vocabularies for Image Categorization , 2007, 2007 IEEE Conference on Computer Vision and Pattern Recognition.

[40]  Léon Bottou,et al.  The Tradeoffs of Large Scale Learning , 2007, NIPS.

[41]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2007, ICML '07.

[42]  Chih-Jen Lin,et al.  A dual coordinate descent method for large-scale linear SVM , 2008, ICML '08.

[43]  Tie-Yan Liu,et al.  Directly optimizing evaluation measures in learning to rank , 2008, SIGIR.

[44]  Sören Sonnenburg,et al.  Optimized cutting plane algorithm for support vector machines , 2008, ICML '08.

[45]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[46]  Antonio Torralba,et al.  Ieee Transactions on Pattern Analysis and Machine Intelligence 1 80 Million Tiny Images: a Large Dataset for Non-parametric Object and Scene Recognition , 2022 .

[47]  Cordelia Schmid,et al.  Constructing Category Hierarchies for Visual Recognition , 2008, ECCV.

[48]  Subhransu Maji,et al.  Max-margin additive classifiers for detection , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[49]  Yanjun Qi,et al.  Supervised semantic indexing , 2009, ECIR.

[50]  Li Fei-Fei,et al.  ImageNet: A large-scale hierarchical image database , 2009, CVPR.

[51]  Yann LeCun,et al.  What is the best multi-stage architecture for object recognition? , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[52]  Gertjan J. Burghouts,et al.  Performance evaluation of local colour invariants , 2009, Comput. Vis. Image Underst..

[53]  Luc Van Gool,et al.  The Pascal Visual Object Classes (VOC) Challenge , 2010, International Journal of Computer Vision.

[54]  Patrick Gallinari,et al.  Ranking with ordered weighted pairwise classification , 2009, ICML '09.

[55]  Gang Wang,et al.  Learning image similarity from Flickr groups using Stochastic Intersection Kernel MAchines , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[56]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[57]  Yihong Gong,et al.  Linear spatial pyramid matching using sparse coding for image classification , 2009, CVPR.

[58]  Dima Damen,et al.  Recognizing linked events: Searching the space of feasible explanations , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[59]  Yihong Gong,et al.  Locality-constrained Linear Coding for image classification , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[60]  Thomas Mensink,et al.  Improving the Fisher Kernel for Large-Scale Image Classification , 2010, ECCV.

[61]  Florent Perronnin,et al.  Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[62]  Jean Ponce,et al.  Learning mid-level features for recognition , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[63]  Thomas S. Huang,et al.  Image Classification Using Super-Vector Coding of Local Image Descriptors , 2010, ECCV.

[64]  Hao Su,et al.  Object Bank: A High-Level Image Representation for Scene Classification & Semantic Feature Sparsification , 2010, NIPS.

[65]  Cor J. Veenman,et al.  Visual Word Ambiguity , 2010, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[66]  Andrew Zisserman,et al.  Efficient additive kernels via explicit feature maps , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[67]  Fei-Fei Li,et al.  What Does Classifying More Than 10, 000 Image Categories Tell Us? , 2010, ECCV.

[68]  Jason Weston,et al.  Label Embedding Trees for Large Multi-Class Tasks , 2010, NIPS.

[69]  Jason Weston,et al.  Large scale image annotation: learning to rank with joint word-image embeddings , 2010, Machine Learning.

[70]  Andrew W. Fitzgibbon,et al.  Efficient Object Category Recognition Using Classemes , 2010, ECCV.

[71]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[72]  Cordelia Schmid,et al.  Product Quantization for Nearest Neighbor Search , 2011, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[73]  Jason Weston,et al.  WSABIE: Scaling Up to Large Vocabulary Image Annotation , 2011, IJCAI.

[74]  Ming Yang,et al.  Large-scale image classification: Fast feature extraction and SVM training , 2011, CVPR 2011.

[75]  Thomas Deselaers,et al.  Visual and semantic similarity in ImageNet , 2011, CVPR 2011.

[76]  Svetlana Lazebnik,et al.  Comparing data-dependent and data-independent embeddings for classification and ranking of Internet images , 2011, CVPR 2011.

[77]  Andrew Zisserman,et al.  The devil is in the details: an evaluation of recent feature encoding methods , 2011, BMVC.

[78]  Sebastian Nowozin,et al.  Structured Learning and Prediction in Computer Vision , 2011, Found. Trends Comput. Graph. Vis..

[79]  Alexander C. Berg,et al.  Fast and Balanced: Efficient Label Tree Learning for Large Scale Object Recognition , 2011, NIPS.

[80]  Lorenzo Torresani,et al.  Scalable object-class retrieval with approximate and top-k ranking , 2011, 2011 International Conference on Computer Vision.

[81]  Daphne Koller,et al.  Discriminative learning of relaxed hierarchy for large-scale visual recognition , 2011, 2011 International Conference on Computer Vision.

[82]  Bernt Schiele,et al.  Evaluating knowledge transfer and zero-shot learning in a large-scale setting , 2011, CVPR 2011.

[83]  Andrew W. Fitzgibbon,et al.  PiCoDes: Learning a Compact Code for Novel-Category Recognition , 2011, NIPS.

[84]  Eric P. Xing,et al.  Large-Scale Category Structure Aware Image Categorization , 2011, NIPS.

[85]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[86]  Florent Perronnin,et al.  High-dimensional signature compression for large-scale image classification , 2011, CVPR 2011.

[87]  Andrew Zisserman,et al.  Sparse kernel approximations for efficient classification and detection , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[88]  Marc'Aurelio Ranzato,et al.  Large Scale Distributed Deep Networks , 2012, NIPS.

[89]  Cordelia Schmid,et al.  Aggregating Local Image Descriptors into Compact Codes , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[90]  Cordelia Schmid,et al.  Towards good practice in large-scale learning for image classification , 2012, CVPR.

[91]  Klaus-Robert Müller,et al.  Efficient BackProp , 2012, Neural Networks: Tricks of the Trade.

[92]  Pascal Vincent,et al.  Representation Learning: A Review and New Perspectives , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[93]  Marc'Aurelio Ranzato,et al.  Building high-level features using large scale unsupervised learning , 2011, 2013 IEEE International Conference on Acoustics, Speech and Signal Processing.