UvA-DARE (Digital Academic Repository) Words are Malleable: Computing Semantic Shifts in Political and Media Discourse

Recently, researchers started to pay attention to the detection of temporal shifts in the meaning of words. However, most (if not all) of these approaches restricted their efforts to uncovering change over time, thus neglecting other valuable dimensions such as social or political variability. We propose an approach for detecting semantic shifts between different viewpoints---broadly defined as a set of texts that share a specific metadata feature, which can be a time-period, but also a social entity such as a political party. For each viewpoint, we learn a semantic space in which each word is represented as a low dimensional neural embedded vector. The challenge is to compare the meaning of a word in one space to its meaning in another space and measure the size of the semantic shifts. We compare the effectiveness of a measure based on optimal transformations between the two spaces with a measure based on the similarity of the neighbors of the word in the respective spaces. Our experiments demonstrate that the combination of these two performs best. We show that the semantic shifts not only occur over time but also along different viewpoints in a short period of time. For evaluation, we demonstrate how this approach captures meaningful semantic shifts and can help improve other tasks such as the contrastive viewpoint summarization and ideology detection (measured as classification accuracy) in political texts. We also show that the two laws of semantic change which were empirically shown to hold for temporal shifts also hold for shifts across viewpoints. These laws state that frequent words are less likely to shift meaning while words with many senses are more likely to do so.

[1]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[2]  Nello Cristianini,et al.  Content analysis of 150 years of British periodicals , 2017, Proceedings of the National Academy of Sciences.

[3]  Ruth Wodak,et al.  Conceptual and methodological questions in the study of collective identities: An Introduction , 2003 .

[4]  Xu Ling,et al.  Topic sentiment mixture: modeling facets and opinions in weblogs , 2007, WWW '07.

[5]  Maarten Marx,et al.  Sources of Evidence for Automatic Indexing of Political Texts , 2015, ECIR.

[6]  Yulan He,et al.  Joint sentiment/topic model for sentiment analysis , 2009, CIKM.

[7]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[8]  Lillian Lee,et al.  Opinion Mining and Sentiment Analysis , 2008, Found. Trends Inf. Retr..

[9]  Graeme Hirst,et al.  Text to Ideology or Text to Party Status , 2010 .

[10]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.

[11]  Peng Jin,et al.  Bag-of-Embeddings for Text Classification , 2016, IJCAI.

[12]  Mohand Boughanem,et al.  VODUM: A Topic Model Unifying Viewpoint, Topic and Opinion Discovery , 2016, ECIR.

[13]  Saul Cornell,et al.  Meaning and Understanding in the History of Constitutional Ideas: The Intellectual History Alternative to Originalism , 2013 .

[14]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[15]  Chenliang Li,et al.  Twevent: segment-based event detection from tweets , 2012, CIKM.

[16]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[17]  Hiroya Takamura,et al.  Analyzing Semantic Change in Japanese Loanwords , 2017, EACL.

[18]  Stephen D. Reese,et al.  Framing the War on Terror , 2009 .

[19]  Lora Aroyo,et al.  Time-aware Multi-Viewpoint Summarization of Multilingual Social Text Streams , 2016, CIKM.

[20]  Omer Levy,et al.  Linguistic Regularities in Sparse and Explicit Word Representations , 2014, CoNLL.

[21]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[22]  Luo Si,et al.  Mining contrastive opinions on political texts using cross-perspective topic model , 2012, WSDM '12.

[23]  Erik Bleich,et al.  The effect of terrorist events on media portrayals of Islam and Muslims: evidence from New York Times headlines, 1985–2013 , 2016 .

[24]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[25]  Amy Beth Warriner,et al.  Concreteness ratings for 40 thousand generally known English word lemmas , 2014, Behavior research methods.

[26]  M. de Rijke,et al.  Ad Hoc Monitoring of Vocabulary Shifts over Time , 2015, CIKM.

[27]  Dong Wang,et al.  Document classification with distributions of word vectors , 2014, Signal and Information Processing Association Annual Summit and Conference (APSIPA), 2014 Asia-Pacific.

[28]  Marco Saerens,et al.  A time-based collective factorization for topic discovery and monitoring in news , 2014, WWW.

[29]  Kevin Duh,et al.  A framework for analyzing semantic change of words across time , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[30]  Quoc V. Le,et al.  Exploiting Similarities among Languages for Machine Translation , 2013, ArXiv.

[31]  Jennifer Widom,et al.  SimRank: a measure of structural-context similarity , 2002, KDD.

[32]  ChengXiang Zhai,et al.  Generating comparative summaries of contradictory opinions in text , 2009, CIKM.

[33]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.

[34]  Hui Xiong,et al.  Discovery of Evolving Semantics through Dynamic Word Embedding Learning , 2017, ArXiv.

[35]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[36]  Tin Kam Ho,et al.  Concept Evolution Modeling Using Semantic Vectors , 2016, WWW.

[37]  W. B. Gallie Essentially Contested Concepts , 1994, The Importance of Language.

[38]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[39]  Maarten Marx,et al.  On Horizontal and Vertical Separation in Hierarchical Text Classification , 2016, ICTIR.

[40]  Maarten Marx,et al.  Two-Way Parsimonious Classification Models for Evolving Hierarchies , 2016, CLEF.