Detecting Common Discussion Topics Across Culture From News Reader Comments

News reader comments found in many on-line news websites are typically massive in amount. We investigate the task of Cultural-common Topic Detection (CTD), which is aimed at discovering common discussion topics from news reader comments written in different languages. We propose a new probabilistic graphical model called MCTA which can cope with the language gap and capture the common semantics in different languages. We also develop a partially collapsed Gibbs sampler which effectively incorporates the term translation relationship into the detection of cultural-common topics for model parameter learning. Experimental results show improvements over the state-of-the-art model.

[1]  Cornelia Caragea,et al.  Entity-Specific Sentiment Classification of Yahoo News Comments , 2015, ArXiv.

[2]  J. Lafferty,et al.  Mixed-membership models of scientific publications , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[3]  Eric P. Xing,et al.  Symmetric Correspondence Topic Models for Multilingual Text Analysis , 2012, NIPS.

[4]  Jian Hu,et al.  Mining multilingual topics from wikipedia , 2009, WWW '09.

[5]  Steve Melluish,et al.  Globalization, culture and psychology , 2014, International review of psychiatry.

[6]  Marie-Francine Moens,et al.  Probabilistic Models of Cross-Lingual Semantic Similarity in Context Based on Latent Cross-Lingual Concepts Induced from Comparable Data , 2014, EMNLP.

[7]  Werner Nutt,et al.  Entity and Aspect Extraction for Organizing News Comments , 2015, CIKM.

[8]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[9]  Zhong Su,et al.  OpinionIt: a text mining system for cross-lingual opinion analysis , 2010, CIKM.

[10]  Gary A. Knight,et al.  International Business: Strategy, Management, and the New Realities , 2007 .

[11]  Thomas Hofmann,et al.  Unsupervised Learning by Probabilistic Latent Semantic Analysis , 2004, Machine Learning.

[12]  Marie-Francine Moens,et al.  Identifying Word Translations from Comparable Corpora Using Latent Topic Models , 2011, ACL.

[13]  Bei Yu,et al.  A cross-collection mixture model for comparative text mining , 2004, KDD.

[14]  Philip Resnik,et al.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation , 2010, EMNLP.

[15]  Andrew McCallum,et al.  Polylingual Topic Models , 2009, EMNLP.

[16]  Hal Daumé,et al.  Extracting Multilingual Topics from Unaligned Comparable Corpora , 2010, ECIR.

[17]  Weiping Wang,et al.  A Cross-Lingual Joint Aspect/Sentiment Model for Sentiment Analysis , 2014, CIKM.

[18]  David M. Blei,et al.  Multilingual Topic Models for Unaligned Text , 2009, UAI.

[19]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[20]  Claire Cardie,et al.  Joint Bilingual Sentiment Classification with Unlabeled Parallel Corpora , 2011, ACL.

[21]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[22]  Michael J. Paul,et al.  Cross-Cultural Analysis of Blogs and Forums with Mixed-Collection Topic Models , 2009, EMNLP.

[23]  Tao Zhang,et al.  Cross Lingual Entity Linking with Bilingual Topic Model , 2013, IJCAI.

[24]  ChengXiang Zhai,et al.  Cross-Lingual Latent Topic Extraction , 2010, ACL.

[25]  Nanyun Peng,et al.  Learning Polylingual Topic Models from Code-Switched Social Media Documents , 2014, ACL.