Grammar and Meaning: Analysing the Topology of Diachronic Word Embeddings

The paper showcases the application of word embeddings to change in language use in the domain of science, focusing on the Late Modern English period (17-19th century). Historically, this is the period in which many registers of English developed, including the language of science. Our overarching interest is the linguistic development of scientific writing to a distinctive (group of) register(s). A register is marked not only by the choice of lexical words (discourse domain) but crucially by grammatical choices which indicate style. The focus of the paper is on the latter, tracing words with primarily grammatical functions (function words and some selected, polyfunctional word forms) diachronically. To this end, we combine diachronic word embeddings with appropriate visualization and exploratory techniques such as clustering and relative entropy for meaningful aggregation of data and diachronic comparison.

[1]  Paul Nulty,et al.  Tracing Shifting Conceptual Vocabularies Through Time , 2016, Drift-a-LOD@EKAW.

[2]  Alessandro Lenci,et al.  Distributional semantics in linguistic and cognitive research , 2008 .

[3]  Yulia Tsvetkov,et al.  A bottom up approach to category mapping and meaning change , 2015, NetWordS.

[4]  Ali Feizollah,et al.  Comparative study of k-means and mini batch k-means clustering algorithms in android malware detection using network traffic analysis , 2014, 2014 International Symposium on Biometrics and Security Technologies (ISBAST).

[5]  Udo Hahn,et al.  Measuring the Dynamics of Lexico-Semantic Change Since the German Romantic Period , 2016, DH.

[6]  D. Biber,et al.  Longman Grammar of Spoken and Written English , 1999 .

[7]  Marc Kupietz,et al.  Visualizing Language Change in a Corpus of Contemporary German , 2017 .

[8]  David Banks,et al.  The Development of Scientific Writing: Linguistic Features and Historical Context , 2008 .

[9]  Terrence Szymanski,et al.  Temporal Word Analogies: Identifying Lexical Replacement with Diachronic Word Embeddings , 2017, ACL.

[10]  Alan W Gross,et al.  Scientific Discourse in Sociohistorical Context: The Philosophical Transactions of the Royal Society of London, 1675-1975. Dwight Atkinson , 2001 .

[11]  Kevin Duh,et al.  A framework for analyzing semantic change of words across time , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[12]  Alessandro Lenci,et al.  Composing and Updating Verb Argument Expectations: A Distributional Semantic Model , 2011, CMCL@ACL.

[13]  Gard B. Jenset Mapping meaning with distributional methods: A diachronic corpus-based study of existential there , 2013 .

[14]  Martin Hilpert,et al.  Distinctive collexeme analysis and diachrony , 2006 .

[15]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[16]  Matthew Hurst,et al.  A Language Model Approach to Keyphrase Extraction , 2003, ACL 2003.

[17]  Harsh Jhamtani,et al.  Charmanteau: Character Embedding Models For Portmanteau Creation , 2017, EMNLP.

[18]  Elke Teich,et al.  The Royal Society Corpus: From Uncharted Data to Corpus , 2016, LREC.

[19]  D. Speelman,et al.  How anger rose: Hypothesis testing in diachronic semantics , 2011 .

[20]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[21]  Omer Levy,et al.  Improving Distributional Similarity with Lessons Learned from Word Embeddings , 2015, TACL.

[22]  D. Biber,et al.  Diachronic relations among speech-based and written registers in English , 2014 .

[23]  Delbert Dueck,et al.  Clustering by Passing Messages Between Data Points , 2007, Science.

[24]  Michal Daszykowski,et al.  Revised DBSCAN algorithm to cluster data with dense adjacent clusters , 2013 .

[25]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[26]  Udo Hahn,et al.  JeSemE: A Website for Exploring Diachronic Changes in Word Meaning and Emotion , 2018, ArXiv.

[27]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[28]  Wang Ling,et al.  Two/Too Simple Adaptations of Word2Vec for Syntax Problems , 2015, NAACL.

[29]  Florent Perek,et al.  Using distributional semantics to study syntactic productivity in diachrony: A case study , 2016 .

[30]  Peter Fankhauser,et al.  Exploring and Visualizing Variation in Language Resources , 2014, LREC.

[31]  Elke Teich,et al.  An Information-Theoretic Approach to Modeling Diachronic Change in Scientific English , 2018, From Data to Evidence in English Language Research.

[32]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.

[33]  D. Sculley,et al.  Web-scale k-means clustering , 2010, WWW '10.

[34]  Eyal Sagi,et al.  Tracing semantic change with latent semantic analysis , 2011 .

[35]  Stefan Th. Gries,et al.  Quantitative approaches to diachronic corpus linguistics , 2016 .

[36]  Omer Levy,et al.  Neural Word Embedding as Implicit Matrix Factorization , 2014, NIPS.

[37]  Geoffrey Zweig,et al.  Linguistic Regularities in Continuous Space Word Representations , 2013, NAACL.

[38]  Hans-Peter Kriegel,et al.  A Density-Based Algorithm for Discovering Clusters in Large Spatial Databases with Noise , 1996, KDD.

[39]  R. A. Leibler,et al.  On Information and Sufficiency , 1951 .