Max-margin Latent Dirichlet Allocation for Image Classification and Annotation

We present the max-margin latent Dirichlet allocation, a max-margin variant of supervised topic models, for image classification and annotation. Our model for image classification (called MMLDA c ) integrates discriminative classification with generative topic models. Our model for image annotation (called MMLDA a ) extends MMLDA c to the case of multi-label problems, where each image can be associated with more than one annotation terms. We derive efficient learning algorithms for both models and demonstrate experimentally the advantages of our proposed models over other baseline methods.

[1]  Stefano Soatto,et al.  Detecting Humans via Their Pose , 2006, NIPS.

[2]  Michael I. Jordan,et al.  DiscLDA: Discriminative Learning for Dimensionality Reduction and Classification , 2008, NIPS.

[3]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[4]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[5]  David A. Forsyth,et al.  Matching Words and Pictures , 2003, J. Mach. Learn. Res..

[6]  Yee Whye Teh,et al.  Names and faces in the news , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[7]  Fei-Fei Li,et al.  ImageNet: A large-scale hierarchical image database , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[8]  Fei-Fei Li,et al.  Towards total scene understanding: Classification, annotation and segmentation in an automatic framework , 2009, CVPR.

[9]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[10]  Antonio Torralba,et al.  Modeling the Shape of the Scene: A Holistic Representation of the Spatial Envelope , 2001, International Journal of Computer Vision.

[11]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[12]  Ali Farhadi,et al.  Scene Discovery by Matrix Factorization , 2008, ECCV.

[13]  John D. Lafferty,et al.  Correlated Topic Models , 2005, NIPS.

[14]  Alexei A. Efros,et al.  Using Multiple Segmentations to Discover Objects and their Extent in Image Collections , 2006, 2006 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'06).

[15]  Koby Crammer,et al.  On the Algorithmic Implementation of Multiclass Kernel-based Vector Machines , 2002, J. Mach. Learn. Res..

[16]  Thomas Hofmann,et al.  A Collapsed Variational Bayesian Inference Algorithm for Latent Dirichlet Allocation , 2007 .

[17]  Yang Wang,et al.  Human Action Recognition by Semilatent Topic Models , 2009, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[18]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[19]  Antonio Torralba,et al.  LabelMe: A Database and Web-Based Tool for Image Annotation , 2008, International Journal of Computer Vision.

[20]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[21]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[22]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.