Nonparametric bayesian upstream supervised multi-modal topic models

Learning with multi-modal data is at the core of many multimedia applications, such as cross-modal retrieval and image annotation. In this paper, we present a nonparametric Bayesian approach to learning upstream supervised topic models for analyzing multi-modal data. Our model develops a compound nonparametric Bayesian multi-modal prior to describe the correlation structure of data both within each individual modality and between different modalities. It extends the hierarchical Dirichlet process (HDP) through incorporating upstream supervised response variables and values of latent functions under Gaussian process (GP). Upstream responses shared by data from multiple modalities are beneficial for discriminatively training and GP allows flexible structure learning of correlations. Hence, our model inherits the automatic determination of the number of topics from HDP, structure learning from GP and enhanced predictive capacity from upstream supervision. We also provide efficient variational inference and prediction algorithms. Empirical studies demonstrate superior performances on several benchmark datasets compared with previous competitors.

[1]  Berkant Barla Cambazoglu,et al.  Review of "Search Engines: Information Retrieval in Practice" by Croft, Metzler and Strohman , 2010, Inf. Process. Manag..

[2]  Stefan M. Rüger,et al.  Automated Image Annotation Using Global Features and Robust Nonparametric Density Estimation , 2005, CIVR.

[3]  Steven C. H. Hoi,et al.  Online multi-modal distance learning for scalable multimedia retrieval , 2013, WSDM.

[4]  D. Blei,et al.  The Discrete Innite Logistic Normal Distribution , 2011, 1103.4789.

[5]  Trevor Darrell,et al.  Factorized Multi-Modal Topic Model , 2012, UAI.

[6]  D. Blackwell,et al.  Ferguson Distributions Via Polya Urn Schemes , 1973 .

[7]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[8]  Vladimir Pavlovic,et al.  A New Baseline for Image Annotation , 2008, ECCV.

[9]  David M. Blei,et al.  Supervised Topic Models , 2007, NIPS.

[10]  Jing Yu,et al.  Cross-modal topic correlations for multimedia retrieval , 2012, Proceedings of the 21st International Conference on Pattern Recognition (ICPR2012).

[11]  W. Bruce Croft,et al.  Search Engines - Information Retrieval in Practice , 2009 .

[12]  Gustavo Carneiro,et al.  Supervised Learning of Semantic Classes for Image Annotation and Retrieval , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David A. Forsyth,et al.  Object Recognition as Machine Translation: Learning a Lexicon for a Fixed Image Vocabulary , 2002, ECCV.

[14]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[15]  Yi Zhen,et al.  A probabilistic model for multimodal hash function learning , 2012, KDD.

[16]  T. Ferguson A Bayesian Analysis of Some Nonparametric Problems , 1973 .

[17]  Carl E. Rasmussen,et al.  Gaussian processes for machine learning , 2005, Adaptive computation and machine learning.

[18]  Samuel Kaski,et al.  Bayesian CCA via Group Sparsity , 2011, ICML.

[19]  G LoweDavid,et al.  Distinctive Image Features from Scale-Invariant Keypoints , 2004 .

[20]  Jing Liu,et al.  Image annotation via graph learning , 2009, Pattern Recognit..

[21]  Cordelia Schmid,et al.  TagProp: Discriminative metric learning in nearest neighbor models for image auto-annotation , 2009, 2009 IEEE 12th International Conference on Computer Vision.

[22]  Vasant Honavar,et al.  Multi-Modal Hierarchical Dirichlet Process Model for Predicting Image Annotation and Image-Object Label Correspondence , 2009, SDM.

[23]  Ning Chen,et al.  Predictive Subspace Learning for Multi-view Data: a Large Margin Approach , 2010, NIPS.

[24]  Fei-Fei Li,et al.  Large Margin Learning of Upstream Scene Understanding Models , 2010, NIPS.

[25]  Yiming Yang,et al.  Multi-field Correlated Topic Modeling , 2009, SDM.

[26]  Acknowledgments , 2006, Molecular and Cellular Endocrinology.

[27]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[28]  Fuchun Sun,et al.  Large-Margin Predictive Latent Subspace Learning for Multiview Data Analysis , 2012, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[29]  Ying He,et al.  Mining social images with distance metric learning for automated image tagging , 2011, WSDM '11.

[30]  Rong Yan,et al.  Mining Associated Text and Images with Dual-Wing Harmoniums , 2005, UAI.

[31]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[32]  Shuicheng Yan,et al.  Multi-label sparse coding for automatic image annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[33]  Dan Klein,et al.  The Infinite PCFG Using Hierarchical Dirichlet Processes , 2007, EMNLP.

[34]  Daniel Gatica-Perez,et al.  Modeling Semantic Aspects for Cross-Media Image Indexing , 2007, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[35]  Michael I. Jordan,et al.  Hierarchical Dirichlet Processes , 2006 .

[36]  Roger Levy,et al.  A new approach to cross-modal multimedia retrieval , 2010, ACM Multimedia.

[37]  R. Manmatha,et al.  A Model for Learning the Semantics of Pictures , 2003, NIPS.

[38]  R. Manmatha,et al.  An Inference Network Approach to Image Retrieval , 2004, CIVR.

[39]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models , 2012, J. Mach. Learn. Res..

[40]  J. Sethuraman A CONSTRUCTIVE DEFINITION OF DIRICHLET PRIORS , 1991 .

[41]  David W. Jacobs,et al.  Generalized Multiview Analysis: A discriminative latent space , 2012, 2012 IEEE Conference on Computer Vision and Pattern Recognition.

[42]  R. Manmatha,et al.  Multiple Bernoulli relevance models for image and video annotation , 2004, Proceedings of the 2004 IEEE Computer Society Conference on Computer Vision and Pattern Recognition, 2004. CVPR 2004..

[43]  John D. Lafferty,et al.  A correlated topic model of Science , 2007, 0708.3601.