We Don't Speak the Same Language: Interpreting Polarization through Machine Translation

Polarization among US political parties, media and elites is a widely studied topic. Prominent lines of prior research across multiple disciplines have observed and analyzed growing polarization in social media. In this paper, we present a new methodology that offers a fresh perspective on interpreting polarization through the lens of machine translation. With a novel proposition that two sub-communities are speaking in two different \emph{languages}, we demonstrate that modern machine translation methods can provide a simple yet powerful and interpretable framework to understand the differences between two (or more) large-scale social media discussion data sets at the granularity of words. Via a substantial corpus of 86.6 million comments by 6.5 million users on over 200,000 news videos hosted by YouTube channels of four prominent US news networks, we demonstrate that simple word-level and phrase-level translation pairs can reveal deep insights into the current political divide -- what is \emph{black lives matter} to one can be \emph{all lives matter} to the other.

[1]  David M. Rothschild,et al.  President Trump Stress Disorder: Partisanship, Ethnicity, and Expressive Reporting of Mental Distress After the 2016 Election , 2019, SAGE Open.

[2]  Barry A. Hollander Tuning Out or Tuning Elsewhere? Partisanship, Polarization, and Media Migration from 1998 to 2006 , 2008 .

[3]  S. Iyengar,et al.  Affect, Not Ideology A Social Identity Perspective on Polarization , 2012 .

[4]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[5]  Tomas Mikolov,et al.  Enriching Word Vectors with Subword Information , 2016, TACL.

[6]  M. Prior,et al.  Media and Political Polarization , 2013 .

[7]  Thomas Gift,et al.  Does Politics Influence Hiring? Evidence from a Randomized Experiment , 2015 .

[8]  Sibel Adali,et al.  Different Spirals of Sameness: A Study of Content Sharing in Mainstream and Alternative Media , 2019, ICWSM.

[9]  Guillaume Lample,et al.  Word Translation Without Parallel Data , 2017, ICLR.

[10]  Dong Wang,et al.  Normalized Word Embedding and Orthogonal Transform for Bilingual Word Translation , 2015, NAACL.

[11]  Lada A. Adamic,et al.  Exposure to ideologically diverse news and opinion on Facebook , 2015, Science.

[12]  Eric P. Xing,et al.  Discovering Sociolinguistic Associations with Structured Sparsity , 2011, ACL.

[13]  James Zou,et al.  Analyzing Polarization in Social Media: Method and Application to Tweets on 21 Mass Shootings , 2019, NAACL.

[14]  Jaime G. Carbonell,et al.  Mining Insights from Large-Scale Corpora Using Fine-Tuned Language Models , 2020, ECAI.

[15]  Gregory A. Huber,et al.  Political Homophily in Social Relationships: Evidence from Online Dating Behavior , 2017, The Journal of Politics.

[16]  Gianluca Stringhini,et al.  What is Gab: A Bastion of Free Speech or an Alt-Right Echo Chamber , 2018, WWW.

[17]  K. T. Poole,et al.  The Polarization of American Politics , 1984, The Journal of Politics.

[18]  K. T. Poole,et al.  Polarized America: The Dance of Ideology and Unequal Riches , 2006 .

[19]  Dana R. Fisher,et al.  Where Does Political Polarization Come From? Locating Polarization Within the U.S. Climate Change Debate , 2013 .

[20]  Kareem Darwish,et al.  Quantifying Polarization on Twitter: The Kavanaugh Nomination , 2019, SocInfo.

[21]  Teresa Correa,et al.  Selective Exposure to Cable News and Immigration in the U.S.: The Relationship Between FOX News, CNN, and Attitudes Toward Mexican Immigrants , 2012 .

[22]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[23]  Samuel L. Smith,et al.  Offline bilingual word vectors, orthogonal transformations and the inverted softmax , 2017, ICLR.

[24]  N. Stroud Niche News: The Politics of News Choice , 2011 .

[25]  Guillaume Lample,et al.  Massively Multilingual Word Embeddings , 2016, ArXiv.

[26]  Brendan T. O'Connor,et al.  Demographic Dialectal Variation in Social Media: A Case Study of African-American English , 2016, EMNLP.

[27]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[28]  Adam Tauman Kalai,et al.  Man is to Computer Programmer as Woman is to Homemaker? Debiasing Word Embeddings , 2016, NIPS.

[29]  K. Hyun,et al.  Agenda Setting in the Partisan TV News Context , 2016 .

[30]  Wiebke Wagner,et al.  Steven Bird, Ewan Klein and Edward Loper: Natural Language Processing with Python, Analyzing Text with the Natural Language Toolkit , 2010, Lang. Resour. Evaluation.

[31]  Zellig S. Harris Distributional Structure , 1970 .

[32]  Arvind Narayanan,et al.  Semantics derived automatically from language corpora contain human-like biases , 2016, Science.

[33]  J. R. Firth,et al.  A Synopsis of Linguistic Theory, 1930-1955 , 1957 .

[34]  Alan W Black,et al.  Black is to Criminal as Caucasian is to Police: Detecting and Removing Multiclass Bias in Word Embeddings , 2019, NAACL.

[35]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[36]  Manaal Faruqui,et al.  Improving Vector Space Word Representations Using Multilingual Correlation , 2014, EACL.

[37]  S. Iyengar,et al.  Fear and Loathing across Party Lines: New Evidence on Group Polarization , 2015 .

[38]  Matthew Baldwin,et al.  Past-focused environmental comparisons promote proenvironmental outcomes for conservatives , 2016, Proceedings of the National Academy of Sciences.

[39]  Georgiana Dinu,et al.  Hubness and Pollution: Delving into Cross-Space Mapping for Zero-Shot Learning , 2015, ACL.

[40]  Guido Caldarelli,et al.  Users Polarization on Facebook and Youtube , 2016, PloS one.