Dynamic Embeddings for Language Evolution

Word embeddings are a powerful approach for unsupervised analysis of language. Recently, Rudolph et al. developed exponential family embeddings, which cast word embeddings in a probabilistic framework. Here, we develop dynamic embeddings, building on exponential family embeddings to capture how the meanings of words change over time. We use dynamic embeddings to analyze three large collections of historical texts: the U.S. Senate speeches from 1858 to 2009, the history of computer science ACM abstracts from 1951 to 2014, and machine learning papers on the ArXiv from 2007 to 2015. We find dynamic embeddings provide better fits than classical embeddings and capture interesting patterns about how language changes.

[1]  Eyal Sagi,et al.  Tracing semantic change with latent semantic analysis , 2011 .

[2]  H. Robbins A Stochastic Approximation Method , 1951 .

[3]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[4]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[5]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[6]  Sanjeev Arora,et al.  RAND-WALK: A Latent Variable Model Approach to Word Embeddings , 2015 .

[7]  Christian Biemann,et al.  That’s sick dude!: Automatic identification of word sense change across different timescales , 2014, ACL.

[8]  Geoffrey E. Hinton,et al.  A Scalable Hierarchical Distributed Language Model , 2008, NIPS.

[9]  Xiaohe Chen,et al.  Semantic change computation: A successive approach , 2013, World Wide Web.

[10]  Stephan Mandt,et al.  Dynamic Word Embeddings via Skip-Gram Filtering , 2017, ArXiv.

[11]  J. Aitchison Language Change: Progress or Decay? , 1981 .

[12]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[13]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[14]  Ryan Cotterell,et al.  Explaining and Generalizing Skip-Gram through Exponential Family Principal Component Analysis , 2017, EACL.

[15]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[16]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[17]  Rada Mihalcea,et al.  Word Epoch Disambiguation: Finding How Words Change Over Time , 2012, ACL.

[18]  Andrew McCallum,et al.  Word Representations via Gaussian Embedding , 2014, ICLR.

[19]  David M. Blei,et al.  Exponential Family Embeddings , 2016, NIPS.

[20]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[21]  Sean Gerrish,et al.  A Language-based Approach to Measuring Scholarly Impact , 2010, ICML.

[22]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[23]  Simon Kirby,et al.  Innateness and culture in the evolution of language , 2006, Proceedings of the National Academy of Sciences.

[24]  Dustin Tran,et al.  Edward: A library for probabilistic modeling, inference, and criticism , 2016, ArXiv.

[25]  Koray Kavukcuoglu,et al.  Learning word embeddings efficiently with noise-contrastive estimation , 2013, NIPS.

[26]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[27]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[28]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[29]  Aapo Hyvärinen,et al.  Noise-contrastive estimation: A new estimation principle for unnormalized statistical models , 2010, AISTATS.

[30]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[31]  Yoshua Bengio,et al.  A Neural Probabilistic Language Model , 2003, J. Mach. Learn. Res..

[32]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[33]  Kevin P. Murphy,et al.  Machine learning - a probabilistic perspective , 2012, Adaptive computation and machine learning series.

[34]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[35]  Shri Kant Machine Learning and Pattern Recognition , 2010 .

[36]  Geoffrey E. Hinton,et al.  Learning representations by back-propagating errors , 1986, Nature.

[37]  Hui Xiong,et al.  Discovery of Evolving Semantics through Dynamic Word Embedding Learning , 2017, ArXiv.

[38]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[39]  Yifan Hu,et al.  Collaborative Filtering for Implicit Feedback Datasets , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[40]  David M. Blei,et al.  Modeling User Exposure in Recommendation , 2015, WWW.

[41]  Sourav S. Bhowmick,et al.  The Past is Not a Foreign Country: Detecting Semantically Similar Terms across Time , 2016, IEEE Transactions on Knowledge and Data Engineering.

[42]  Yoshua Bengio,et al.  Hierarchical Probabilistic Neural Network Language Model , 2005, AISTATS.

[43]  Zellig S. Harris,et al.  Distributional Structure , 1954 .

[44]  R. Mazo On the theory of brownian motion , 1973 .

[45]  B. Arnold,et al.  Conditionally Specified Distributions: An Introduction (with comments and a rejoinder by the authors) , 2001 .

[46]  Chong Wang,et al.  Dynamic Language Models for Streaming Text , 2014, TACL.

[47]  Yoram Singer,et al.  Adaptive Subgradient Methods for Online Learning and Stochastic Optimization , 2011, J. Mach. Learn. Res..