Supervised Deep Polylingual Topic Modeling for Scholarly Information Recommendations

Polylingual text processing is important for content-based and hybrid recommender systems. It helps recommender systems extract content information from broader sources. It also enables systems to recommend items in a user’s native language. We propose a cross-lingual keyword recommendation method based on a polylingual topic model. The model is further extended with a popular deep learning architecture, the CNN– RNN model. With this model, keywords can be recommended from text written in different languages; model parameters are very meaningful, and we can interpret them. We evaluate the proposed method using crosslingual bibliographic databases that contain both English and Japanese abstracts and keywords.

[1]  Charles Sutton,et al.  Neural Variational Inference For Topic Models , 2016 .

[2]  Wei Xu,et al.  CNN-RNN: A Unified Framework for Multi-label Image Classification , 2016, 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[3]  Qiang Zhou,et al.  An Online Inference Algorithm for Labeled Latent Dirichlet Allocation , 2015, APWeb.

[4]  Atsuhiro Takasu,et al.  Cross-lingual keyword recommendation using latent topics , 2010, HetRec '10.

[5]  Padhraic Smyth,et al.  Stick-Breaking Variational Autoencoders , 2016, ICLR.

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Weiyi Meng,et al.  A Latent Topic Model for Complete Entity Resolution , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[8]  Mihhail Matskin,et al.  OLLDA: A Supervised and Dynamic Topic Mining Framework in Twitter , 2015, 2015 IEEE International Conference on Data Mining Workshop (ICDMW).

[9]  David A. Smith,et al.  Online Polylingual Topic Models for Fast Document Translation Detection , 2013, WMT@ACL.

[10]  Masahiro Suzuki,et al.  Joint Multimodal Learning with Deep Generative Models , 2016, ICLR.

[11]  Francis R. Bach,et al.  Online Learning for Latent Dirichlet Allocation , 2010, NIPS.

[12]  Ramesh Nallapati,et al.  Labeled LDA: A supervised topic model for credit attribution in multi-labeled corpora , 2009, EMNLP.

[13]  Susan T. Dumais,et al.  Partially labeled topic models for interpretable text mining , 2011, KDD.