Bayesian nonparametric inference of latent topic hierarchies for multimodal data

Research on multimodal data analysis such as annotated image analysis is becoming more important than ever due to the increase in the amount of data. One of the approaches to this problem is multimodal topic models as an extension of latent Dirichlet allocation (LDA). Symmetric correspondence topic models (SymCorrLDA) are state-of-the-art multimodal topic models that can appropriately model multimodal data considering inter-modal dependencies. Incidentally, hierarchically structured categories can help users find relevant data from a large amount of data collection. Hierarchical topic models such as hierarchical latent Dirichlet allocation (hLDA) can discover a tree-structured hierarchy of latent topics from a given unimodal data collection; however, no hierarchical topic models can appropriately handle multimodal data considering intermodal mutual dependencies. In this paper, we propose h-SymCorrLDA to discover latent topic hierarchies from multimodal data by combining the ideas of the two previously mentioned models: multimodal topic models and hierarchical topic models. We demonstrate the effectiveness of our model compared with several baseline models through experiments with two datasets of annotated images.

[1]  Matthijs C. Dorst Distinctive Image Features from Scale-Invariant Keypoints , 2011 .

[2]  Eric P. Xing,et al.  Symmetric Correspondence Topic Models for Multilingual Text Analysis , 2012, NIPS.

[3]  Hugo Larochelle,et al.  Topic Modeling of Multimodal Data: An Autoregressive Approach , 2014, 2014 IEEE Conference on Computer Vision and Pattern Recognition.

[4]  D. Aldous Exchangeability and related topics , 1985 .

[5]  Mark J. Huiskes,et al.  The MIR flickr retrieval evaluation , 2008, MIR '08.

[6]  Pietro Perona,et al.  A Bayesian hierarchical model for learning natural scene categories , 2005, 2005 IEEE Computer Society Conference on Computer Vision and Pattern Recognition (CVPR'05).

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  Fei-Fei Li,et al.  What, where and who? Classifying events by scene and object recognition , 2007, 2007 IEEE 11th International Conference on Computer Vision.

[9]  Thomas L. Griffiths,et al.  Hierarchical Topic Models and the Nested Chinese Restaurant Process , 2003, NIPS.

[10]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Michael I. Jordan,et al.  Modeling annotated data , 2003, SIGIR.

[12]  Fei-Fei Li,et al.  Building and using a semantivisual image hierarchy , 2010, 2010 IEEE Computer Society Conference on Computer Vision and Pattern Recognition.