Detecting Cross-Cultural Differences Using a Multilingual Topic Model

Understanding cross-cultural differences has important implications for world affairs and many aspects of the life of society. Yet, the majority of text-mining methods to date focus on the analysis of monolingual texts. In contrast, we present a statistical model that simultaneously learns a set of common topics from multilingual, non-parallel data and automatically discovers the differences in perspectives on these topics across linguistic communities. We perform a behavioural evaluation of a subset of the differences identified by our model in English and Spanish to investigate their psychological validity.

[1]  R. A. Fisher,et al.  Statistical Tables for Biological, Agricultural and Medical Research , 1956 .

[2]  J. Wishart Statistical tables , 2018, Global Education Monitoring Report.

[3]  S. Dehaene Varieties of numerical abilities , 1992, Cognition.

[4]  W. Fias The Importance of Magnitude Information in Numerical Processing: Evidence from the SNARC Effect , 1996 .

[5]  Jonathan Charteris-Black,et al.  A comparative study of metaphor in Spanish and English financial reporting , 2001 .

[6]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[7]  Z. Kövecses Introduction: Cultural Variation In Metaphor , 2004 .

[8]  Janyce Wiebe,et al.  Learning Subjective Language , 2004, CL.

[9]  A. Volgenant,et al.  A shortest augmenting path algorithm for dense and sparse linear assignment problems , 1987, Computing.

[10]  Ravi Kumar,et al.  "I know what you did last summer": query logs and user privacy , 2007, CIKM '07.

[11]  Dragomir R. Radev,et al.  MavenRank: Identifying Influential Members of the US Senate Using Lexical Centrality , 2007, EMNLP.

[12]  L. Boroditsky,et al.  Time in the mind: Using space to think about time , 2008, Cognition.

[13]  Dan Klein,et al.  Learning Bilingual Lexicons from Monolingual Corpora , 2008, ACL.

[14]  Ruslan Salakhutdinov,et al.  Evaluating probabilities under high-dimensional latent variable models , 2008, NIPS.

[15]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[16]  Burt L. Monroe,et al.  Fightin' Words: Lexical Feature Selection and Evaluation for Identifying the Content of Political Conflict , 2008, Political Analysis.

[17]  Andrew McCallum,et al.  Polylingual Topic Models , 2009, EMNLP.

[18]  David M. Blei,et al.  Multilingual Topic Models for Unaligned Text , 2009, UAI.

[19]  David Yarowsky,et al.  Modeling Latent Biographic Attributes in Conversational Genres , 2009, ACL.

[20]  Michael J. Paul,et al.  Cross-Cultural Analysis of Blogs and Forums with Mixed-Collection Topic Models , 2009, EMNLP.

[21]  Eric P. Xing,et al.  Staying Informed: Supervised and Semi-Supervised Multi-View Topical Analysis of Ideological Perspective , 2010, EMNLP.

[22]  Sanford Weisberg,et al.  An R Companion to Applied Regression , 2010 .

[23]  Oren Etzioni,et al.  A Latent Dirichlet Allocation Method for Selectional Preferences , 2010, ACL.

[24]  Eric P. Xing,et al.  Social Links from Latent Topics in Microblogs , 2010, HLT-NAACL 2010.

[25]  Philip Resnik,et al.  Holistic Sentiment Analysis Across Languages: Multilingual Supervised Latent Dirichlet Allocation , 2010, EMNLP.

[26]  Hal Daumé,et al.  Extracting Multilingual Topics from Unaligned Comparable Corpora , 2010, ECIR.

[27]  Timothy Baldwin,et al.  Automatic Evaluation of Topic Coherence , 2010, NAACL.

[28]  Diarmuid Ó Séaghdha Latent Variable Models of Selectional Preference , 2010, ACL.

[29]  Kelly McCormick,et al.  How Linguistic and Cultural Forces Shape Conceptions of Time: English and Mandarin Time in 3D , 2011, Cogn. Sci..

[30]  L. Boroditsky,et al.  Metaphors We Think With: The Role of Metaphor in Reasoning , 2011, PloS one.

[31]  Gerhard Weikum,et al.  OpinioNetIt: understanding the opinions-people network for politically controversial topics , 2011, CIKM '11.

[32]  Sean Gerrish,et al.  Predicting Legislative Roll Calls from Text , 2011, ICML.

[33]  Ana-Maria Popescu,et al.  Democrats, republicans and starbucks afficionados: user classification in twitter , 2011, KDD.

[34]  Andrew McCallum,et al.  Optimizing Semantic Coherence in Topic Models , 2011, EMNLP.

[35]  Dragomir R. Radev,et al.  Subgroup Detection in Ideological Discussions , 2012, ACL.

[36]  G. Lakoff,et al.  The Little Blue Book: The Essential Guide to Thinking and Talking Democratic , 2012 .

[37]  David Buttler,et al.  Exploring Topic Coherence over Many Models and Many Topics , 2012, EMNLP.

[38]  Luo Si,et al.  Mining contrastive opinions on political texts using cross-perspective topic model , 2012, WSDM '12.

[39]  Noah A. Smith,et al.  Learning Topics and Positions from Debatepedia , 2013, EMNLP.

[40]  Jing Jiang,et al.  A Latent Variable Model for Viewpoint Discovery from Threaded Forum Posts , 2013, NAACL.