Bayesian Non-parametric Inference of Multimodal Topic Hierarchies

Research on multimodal data analysis such as annotated image analysis is becoming more important than ever due to the increase in the amount of data. One of the approaches to this problem is multimodal topic models as an extension of Latent Dirichlet allocation (LDA). Symmetric correspondence topic models (SymCorrLDA) are state-of-the-art multimodal topic models that can appropriately model multimodal data considering inter-modal dependencies. Incidentally, hierarchically structured categories can help users find relevant data from a large amount of data collection. Hierarchical topic models such as Hierarchical latent Dirichlet allocation (hLDA) can discover a tree-structured hierarchy of latent topics from a given unimodal data collection; however, no hierarchical topic models can appropriately handle multimodal data considering inter-modal mutual dependencies. In this paper, we propose hSymCorrLDA to discover latent topic hierarchies from multimodal data by combining the ideas of the two previously mentioned models: multimodal topic models and hierarchical topic models. We demonstrate the effectiveness of our model compared with several baseline models through experiments with three datasets of annotated images.

[1]  Hugo Larochelle,et al.  Topic Modeling of Multimodal Data: An Autoregressive Approach , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[2]  D. Aldous Exchangeability and related topics , 1985 .

[3]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[4]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[5]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, CVPR.

[6]  Alexander J. Smola,et al.  Nested Chinese Restaurant Franchise Process: Applications to User Tracking and Document Modeling , 2013, ICML.

[7]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[8]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[9]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[10]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[11]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[12]  Eric P. Xing,et al.  Symmetric Correspondence Topic Models for Multilingual Text Analysis , 2012, NIPS.

[13]  Chong Wang,et al.  Simultaneous image classification and annotation , 2009, 2009 IEEE Conference on Computer Vision and Pattern Recognition.

[14]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[15]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.