Learning Diachronic Word Embeddings with Iterative Stable Information Alignment

Diachronic word embedding aims to reveal the semantic evolution of words over time. Previous works learned word embeddings in different time periods first, and then aligned all the word embeddings into a same vector space. Different from previous works, we iteratively identify stable words, meanings of which remain acceptably stable even in different time periods, as anchors to ensure the performances of both embedding learning and alignment. To learn word embeddings in the same vector space, two different cross-time constraints are used during training. Initially, we identify the most obvious stable words with an unconstrained model, and then use hard constraint to restrain them in related stable time periods. In the iterative process, we identify new stable words from previously trained model and use soft constraint on them to fine-tune the model. We use COHA dataset (https://corpus.byu.edu/coha/) [14], which consists of texts from 1810s to 2000s. Both qualitative and quantitative evaluations show our model can capture meanings in each single time period accurately and model the changes of word meaning. Experimental results indicate that our proposed model outperforms all baseline methods in terms of diachronic text evaluation.

[1]  Jacob Goldberger,et al.  Aligning Vector-spaces with Noisy Supervised Lexicons , 2019, NAACL-HLT.

[2]  Kira Radinsky,et al.  Learning Word Relatedness over Time , 2017, EMNLP.

[3]  Jim Q. Smith,et al.  GASC: Genre-Aware Semantic Change for Ancient Greek , 2019, LChange@ACL.

[4]  Björn-Olav Dozo,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010 .

[5]  S. Tagliamonte,et al.  LINGUISTIC RUIN? LOL! INSTANT MESSAGING AND TEEN LANGUAGE , 2008 .

[6]  Gerhard Heyer,et al.  Change of Topics over Time - Tracking Topics by their Change of Meaning , 2009, KDIR.

[7]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[8]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[9]  Mark Davies,et al.  The 400 million word Corpus of Historical American English (1810–2009) , 2012 .

[10]  Erik Velldal,et al.  Tracing armed conflicts with diachronic word embedding models , 2017, NEWS@ACL.

[11]  David M. Blei,et al.  Dynamic Bernoulli Embeddings for Language Evolution , 2017, ArXiv.

[12]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[13]  Sourav S. Bhowmick,et al.  The Past is Not a Foreign Country: Detecting Semantically Similar Terms across Time , 2016, IEEE Transactions on Knowledge and Data Engineering.

[14]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[15]  Chiraag Lala,et al.  Word vector-space embeddings of natural language data over time , 2014 .

[16]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[17]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[18]  Daniel Jurafsky,et al.  Word embeddings quantify 100 years of gender and ethnic stereotypes , 2017, Proceedings of the National Academy of Sciences.

[19]  Hui Xiong,et al.  Discovery of Evolving Semantics through Dynamic Word Embedding Learning , 2017, ArXiv.

[20]  Stephan Mandt,et al.  Dynamic Word Embeddings via Skip-Gram Filtering , 2017, ArXiv.

[21]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[22]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[23]  David Crystal,et al.  Internet Linguistics: A Student Guide , 2011 .

[24]  Terrence Szymanski,et al.  Temporal Word Analogies: Identifying Lexical Replacement with Diachronic Word Embeddings , 2017, ACL.

[25]  Christian Biemann,et al.  That’s sick dude!: Automatic identification of word sense change across different timescales , 2014, ACL.

[26]  Guy Merchant Teenagers in cyberspace: an investigation of language use and language change in internet chatrooms , 2001 .

[27]  Carlo Strapparava,et al.  SemEval 2015, Task 7: Diachronic Text Evaluation , 2015, *SEMEVAL.

[28]  Ellen Isaacs,et al.  Teen use of messaging media , 2002, CHI Extended Abstracts.

[29]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[30]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.