论文信息 - We Don't Speak the Same Language: Interpreting Polarization through Machine Translation - 字舞流文

We Don't Speak the Same Language: Interpreting Polarization through Machine Translation

Polarization among US political parties, media and elites is a widely studied topic. Prominent lines of prior research across multiple disciplines have observed and analyzed growing polarization in social media. In this paper, we present a new methodology that offers a fresh perspective on interpreting polarization through the lens of machine translation. With a novel proposition that two sub-communities are speaking in two different \emph{languages}, we demonstrate that modern machine translation methods can provide a simple yet powerful and interpretable framework to understand the differences between two (or more) large-scale social media discussion data sets at the granularity of words. Via a substantial corpus of 86.6 million comments by 6.5 million users on over 200,000 news videos hosted by YouTube channels of four prominent US news networks, we demonstrate that simple word-level and phrase-level translation pairs can reveal deep insights into the current political divide -- what is \emph{black lives matter} to one can be \emph{all lives matter} to the other.

Tom M. Mitchell | Ashiqur R. KhudaBukhsh | Rupak Sarkar | Mark S. Kamlet | Tom Michael Mitchell | M. Kamlet | Rupak Sarkar

[1] David M. Rothschild,et al. President Trump Stress Disorder: Partisanship, Ethnicity, and Expressive Reporting of Mental Distress After the 2016 Election , 2019, SAGE Open.

[2] Barry A. Hollander. Tuning Out or Tuning Elsewhere? Partisanship, Polarization, and Media Migration from 1998 to 2006 , 2008 .

[3] S. Iyengar,et al. Affect, Not Ideology A Social Identity Perspective on Polarization , 2012 .

[4] Daniel Jurafsky,et al. Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[5] Tomas Mikolov,et al. Enriching Word Vectors with Subword Information , 2016, TACL.

[6] M. Prior,et al. Media and Political Polarization , 2013 .

[7] Thomas Gift,et al. Does Politics Influence Hiring? Evidence from a Randomized Experiment , 2015 .

[8] Sibel Adali,et al. Different Spirals of Sameness: A Study of Content Sharing in Mainstream and Alternative Media , 2019, ICWSM.

[9] Guillaume Lample,et al. Word Translation Without Parallel Data , 2017, ICLR.

[10] Dong Wang,et al. Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[11] Lada A. Adamic,et al. Exposure to ideologically diverse news and opinion on Facebook , 2015, Science.

[12] Eric P. Xing,et al. Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[13] James Zou,et al. Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings , 2019, NAACL.

[14] Jaime G. Carbonell,et al. Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models , 2020, ECAI.

[15] Gregory A. Huber,et al. Political Homophily in Social Relationships: Evidence from Online Dating Behavior , 2017, The Journal of Politics.

[16] Gianluca Stringhini,et al. What is Gab: A Bastion of Free Speech or an Alt-Right Echo Chamber , 2018, WWW.

[17] K. T. Poole,et al. The Polarization of American Politics , 1984, The Journal of Politics.

[18] K. T. Poole,et al. Polarized America: The Dance of Ideology and Unequal Riches , 2006 .

[19] Dana R. Fisher,et al. Where Does Political Polarization Come From? Locating Polarization Within the U.S. Climate Change Debate , 2013 .

[20] Kareem Darwish,et al. Quantifying Polarization on Twitter: The Kavanaugh Nomination , 2019, SocInfo.

[21] Teresa Correa,et al. Selective Exposure to Cable News and Immigration in the U.S.: The Relationship Between FOX News, CNN, and Attitudes Toward Mexican Immigrants , 2012 .

[22] Quoc V. Le,et al. Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[23] Samuel L. Smith,et al. Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[24] N. Stroud. Niche News: The Politics of News Choice , 2011 .

[25] Guillaume Lample,et al. Massively Multilingual Word Embeddings , 2016, ArXiv.

[26] Brendan T. O'Connor,et al. Demographic Dialectal Variation in Social Media: A Case Study of African-American English , 2016, EMNLP.

[27] Jeffrey Dean,et al. Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28] Adam Tauman Kalai,et al. Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[29] K. Hyun,et al. Agenda Setting in the Partisan TV News Context , 2016 .

[30] Wiebke Wagner,et al. Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[31] Zellig S. Harris. Distributional Structure , 1970 .

[32] Arvind Narayanan,et al. Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[33] J. R. Firth,et al. A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[34] Alan W Black,et al. Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings , 2019, NAACL.

[35] Geoffrey E. Hinton,et al. Visualizing Data using t-SNE , 2008 .

[36] Manaal Faruqui,et al. Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[37] S. Iyengar,et al. Fear and Loathing across Party Lines: New Evidence on Group Polarization , 2015 .

[38] Matthew Baldwin,et al. Past-focused environmental comparisons promote proenvironmental outcomes for conservatives , 2016, Proceedings of the National Academy of Sciences.

[39] Georgiana Dinu,et al. Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning , 2015, ACL.

[40] Guido Caldarelli,et al. Users Polarization on Facebook and Youtube , 2016, PloS one.