Multi-Instance Multi-Label Learning for Image Classification with Large Vocabularies

Multiple Instance Multiple Label learning problem has received much attention in machine learning and computer vision literature due to its applications in image classification and object detection. However, the current state-of-the-art solutions to this problem lack scalability and cannot be applied to datasets with a large number of instances and a large number of labels. In this paper we present a novel learning algorithm for Multiple Instance Multiple Label learning that is scalable for large datasets and performs comparable to the state-of-the-art algorithms. The proposed algorithm trains a set of discriminative multiple instance classifiers (one for each label in the vocabulary of all possible labels) and models the correlations among labels by finding a low rank weight matrix thus forcing the classifiers to share weights. This algorithm is a linear model unlike the state-of-the-art kernel methods which need to compute the kernel matrix. The model parameters are efficiently learned by solving an unconstrained optimization problem for which Stochastic Gradient Descent can be used to avoid storing all the data in memory.

[1]  Chih-Jen Lin,et al.  LIBLINEAR: A Library for Large Linear Classification , 2008, J. Mach. Learn. Res..

[2]  Massimiliano Pontil,et al.  Multi-Task Feature Learning , 2006, NIPS.

[3]  Thomas Hofmann,et al.  Multi-Instance Multi-Label Learning with Application to Scene Classification , 2007 .

[4]  Hugo Jair Escalante,et al.  The segmented and annotated IAPR TC-12 benchmark , 2010, Comput. Vis. Image Underst..

[5]  Jason Weston,et al.  Trading convexity for scalability , 2006, ICML.

[6]  Thomas Gärtner,et al.  Multi-Instance Kernels , 2002, ICML.

[7]  Zhi-Hua Zhou,et al.  M3MIML: A Maximum Margin Method for Multi-instance Multi-label Learning , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[8]  Olivier Chapelle,et al.  Training a Support Vector Machine in the Primal , 2007, Neural Computation.

[9]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[10]  Ji Zhu,et al.  Margin Maximizing Loss Functions , 2003, NIPS.

[11]  Paul A. Viola,et al.  Multiple Instance Boosting for Object Detection , 2005, NIPS.

[12]  Florent Perronnin,et al.  Large-scale image categorization with explicit data embedding , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[13]  Francis R. Bach,et al.  A New Approach to Collaborative Filtering: Operator Estimation with Spectral Regularization , 2008, J. Mach. Learn. Res..

[14]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[15]  Murat Dundar,et al.  Bayesian multiple instance learning: automatic feature selection and inductive transfer , 2008, ICML '08.

[16]  Stephen J. Wright,et al.  Numerical Optimization , 2018, Fundamental Statistical Inference.

[17]  Jitendra Malik,et al.  Normalized cuts and image segmentation , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[18]  Yoram Singer,et al.  Pegasos: primal estimated sub-gradient solver for SVM , 2011, Math. Program..

[19]  Thomas G. Dietterich,et al.  Solving the Multiple Instance Problem with Axis-Parallel Rectangles , 1997, Artif. Intell..

[20]  Tao Mei,et al.  Joint multi-label multi-instance learning for image classification , 2008, 2008 IEEE Conference on Computer Vision and Pattern Recognition.

[21]  Ali Farhadi,et al.  Scene Discovery by Matrix Factorization , 2008, ECCV.

[22]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[23]  Jorge Nocedal,et al.  On the limited memory BFGS method for large scale optimization , 1989, Math. Program..

[24]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[25]  Antonio Criminisi,et al.  TextonBoost: Joint Appearance, Shape and Context Modeling for Multi-class Object Recognition and Segmentation , 2006, ECCV.

[26]  Andrew McCallum,et al.  Collective multi-label classification , 2005, CIKM '05.

[27]  Shimon Ullman,et al.  Uncovering shared structures in multiclass classification , 2007, ICML '07.

[28]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[29]  Kristen Grauman,et al.  What's it going to cost you?: Predicting effort vs. informativeness for multi-label image annotations , 2009, CVPR.

[30]  Zhi-Hua Zhou,et al.  Multi-instance multi-label learning , 2008, Artif. Intell..

[31]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[32]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[33]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..