Relation Prediction in Multilingual Data Based on Multimodal Relational Topic Models

There are increasing demands for improved analysis of multimodal data that consist of multiple representations, such as multilingual documents and text-annotated images. One promising approach for analyzing such multimodal data is latent topic models. In this paper, we propose conditionally independent generalized relational topic models (CIgRTM) for predicting unknown relations across different multiple representations of multimodal data. We developed CI-gRTM as a multimodal extension of discriminative relational topic models called generalized relational topic models (gRTM). We demonstrated through experiments with multilingual documents that CI-gRTM can more effectively predict both multilingual representations and relations between two different language representations compared with several state-of-the-art baseline models that enable to predict either multilingual representations or unimodal relations. key words: latent topic models, relational topic models, multimodal data, margin maximization

[1]  Hal Daumé,et al.  Extracting Multilingual Topics from Unaligned Comparable Corpora , 2010, ECIR.

[2]  Thomas Hofmann,et al.  Probabilistic Latent Semantic Indexing , 1999, SIGIR Forum.

[3]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[4]  Ning Chen,et al.  Generalized Relational Topic Models with Data Augmentation , 2013, IJCAI.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Thomas L. Griffiths,et al.  Nonparametric Latent Feature Models for Link Prediction , 2009, NIPS.

[7]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[8]  Eric P. Xing,et al.  MedLDA: maximum margin supervised topic models for regression and classification , 2009, ICML '09.

[9]  Andrew McCallum,et al.  Polylingual Topic Models , 2009, EMNLP.

[10]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[11]  Padhraic Smyth,et al.  Statistical entity-topic models , 2006, KDD '06.

[12]  James G. Scott,et al.  Bayesian Inference for Logistic Models Using Pólya–Gamma Latent Variables , 2012, 1205.0310.

[13]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[14]  ChengXiang Zhai,et al.  Cross-Lingual Latent Topic Extraction , 2010, ACL.

[15]  David M. Blei,et al.  Multilingual Topic Models for Unaligned Text , 2009, UAI.