Tracking the Evolution of Words with Time-reflective Text Representations

More than 80% of today’s data is unstructured in nature, and these unstructured datasets evolve over time. A large part of these datasets are text documents generated by media outlets, scholarly articles in digital libraries, findings from scientific and professional communities, and social media. Vector space models were developed to analyze text data using data mining and machine learning algorithms. While ample vector space models exist for text data, the evolutionary aspect of ever changing text corpora is still missing in vector-based representations. The advent of word embeddings has enabled us to create a contextual vector space, but the embeddings fail to consider the temporal aspects of the feature space successfully. This paper presents an approach to include temporal aspects in feature spaces. The inclusion of the time aspect in the feature space provides vectors for every natural language element, such as words or entities, at every timestamp. Such temporal word vectors allow us to track how the meaning of a word changes over time, by studying the changes in its neighborhood. Moreover, a time-reflective text representation will pave the way to a new set of text analytic abilities involving time series for text collections.In this paper, we present a time-reflective vector space model for temporal text data that is able to capture short and long-term changes in the meaning of words. We compare our approach with the limited literature on dynamic embeddings. We present qualitative and quantitative evaluations using the tracking of semantic evolution as the target application.

[1]  A. Ohtsu,et al.  Phase I/II study of S-1 combined with cisplatin in patients with advanced gastric cancer , 2003, British Journal of Cancer.

[2]  Gerhard Heyer,et al.  Change of Topics over Time - Tracking Topics by their Change of Meaning , 2009, KDIR.

[3]  Chong Wang,et al.  Dynamic Language Models for Streaming Text , 2014, TACL.

[4]  W. H. Carpenter,et al.  The Study of Language , 2019 .

[5]  G. Nuovo,et al.  Bovine leukemia virus linked to breast cancer but not coinfection with human papillomavirus: Case‐control study of women in Texas , 2018, Cancer.

[6]  D J Rogers,et al.  A Computer Program for Classifying Plants. , 1960, Science.

[7]  Kira Radinsky,et al.  Learning causality for news events prediction , 2012, WWW.

[8]  J J Angulo,et al.  Concepts of diffusion theory and a graphic approach to the description of the epidemic flow of contagious disease. , 1980, Public health reports.

[9]  Jeffrey Pennington,et al.  GloVe: Global Vectors for Word Representation , 2014, EMNLP.

[10]  Mirella Lapata,et al.  A Bayesian Model of Diachronic Meaning Change , 2016, TACL.

[11]  K. Pilancı,et al.  Current adjuvant treatment modalities for gastric cancer: From history to the future. , 2016, World journal of gastrointestinal oncology.

[12]  X. F. Wang,et al.  FAM196B acts as oncogene and promotes proliferation of gastric cancer cells through AKT signaling pathway. , 2017, Cellular and molecular biology.

[13]  Christian Biemann,et al.  An automatic approach to identify word sense changes in text media across timescales , 2015, Natural Language Engineering.

[14]  T. Naoe,et al.  High complete remission rate and promising outcome by combination of imatinib and chemotherapy for newly diagnosed BCR-ABL-positive acute lymphoblastic leukemia: a phase II study by the Japan Adult Leukemia Study Group. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[15]  Jure Leskovec,et al.  Diachronic Word Embeddings Reveal Statistical Laws of Semantic Change , 2016, ACL.

[16]  M. Baccarani,et al.  Hematologic and cytogenetic responses to imatinib mesylate in chronic myelogenous leukemia. , 2002, The New England journal of medicine.

[17]  Xuri Tang,et al.  A state-of-the-art of semantic change computation , 2018, Natural Language Engineering.

[18]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[19]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[20]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[21]  Chong Wang,et al.  Continuous Time Dynamic Topic Models , 2008, UAI.

[22]  Xiaohe Chen,et al.  Semantic change computation: A successive approach , 2013, World Wide Web.

[23]  Kira Radinsky,et al.  Learning Word Relatedness over Time , 2017, EMNLP.

[24]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[25]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[26]  M. Shahriar Hossain,et al.  Storytelling in entity networks to support intelligence analysts , 2012, KDD.

[27]  G. Dolcini,et al.  Can Bovine Leukemia Virus Be Related to Human Breast Cancer? A Review of the Evidence , 2018, Journal of Mammary Gland Biology and Neoplasia.

[28]  M. de Rijke,et al.  Ad Hoc Monitoring of Vocabulary Shifts over Time , 2015, CIKM.

[29]  Udo Hahn,et al.  Bad Company—Neighborhoods in Neural Embedding Spaces Considered Harmful , 2016, COLING.

[30]  M. Kanda,et al.  Molecular mechanisms of peritoneal dissemination in gastric cancer. , 2016, World journal of gastroenterology.

[31]  Xudong Wang,et al.  High TREM2 expression correlates with poor prognosis in gastric cancer. , 2018, Human pathology.

[32]  Patrick Pantel,et al.  From Frequency to Meaning: Vector Space Models of Semantics , 2010, J. Artif. Intell. Res..

[33]  R. Pazdur,et al.  Approval summary: imatinib mesylate in the treatment of metastatic and/or unresectable malignant gastrointestinal stromal tumors. , 2002, Clinical cancer research : an official journal of the American Association for Cancer Research.

[34]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[35]  Rada Mihalcea,et al.  Word Epoch Disambiguation: Finding How Words Change Over Time , 2012, ACL.

[36]  Stephan Mandt,et al.  Dynamic Word Embeddings , 2017, ICML.

[37]  Anna Gladkova,et al.  Intrinsic Evaluations of Word Embeddings: What Can We Do Better? , 2016, RepEval@ACL.

[38]  M. Slevin,et al.  Chemotherapy for stomach cancer. , 1987, British medical journal.

[39]  Kevin Duh,et al.  A framework for analyzing semantic change of words across time , 2014, IEEE/ACM Joint Conference on Digital Libraries.

[40]  George Yule,et al.  The study of language , 1998 .

[41]  H. Narahara,et al.  Phase I/II study of docetaxel and S-1 in patients with advanced gastric cancer , 2006, British Journal of Cancer.

[42]  R. Kong,et al.  SP1-induced upregulation of the long noncoding RNA TINCR regulates cell proliferation and apoptosis by affecting KLF2 mRNA stability in gastric cancer , 2015, Oncogene.

[43]  Steven Skiena,et al.  Statistically Significant Detection of Linguistic Change , 2014, WWW.

[44]  Daphna Weinshall,et al.  Outta Control: Laws of Semantic Change and Inherent Biases in Word Representation Models , 2017, EMNLP.

[45]  H. Döhner,et al.  Bendamustine in combination with rituximab for previously untreated patients with chronic lymphocytic leukemia: a multicenter phase II trial of the German Chronic Lymphocytic Leukemia Study Group. , 2012, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[46]  Lequn Li,et al.  Hepatectomy for liver metastases from gastric cancer: a systematic review , 2017, BMC Surgery.

[47]  Oren Barkan,et al.  Bayesian Neural Word Embedding , 2016, AAAI.

[48]  W. Wang,et al.  Overexpression of FIBCD1 Is Predictive of Poor Prognosis in Gastric Cancer , 2018, American journal of clinical pathology.

[49]  David M. Blei,et al.  Dynamic Embeddings for Language Evolution , 2018, WWW.

[50]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[51]  Jure Leskovec,et al.  Cultural Shift or Linguistic Drift? Comparing Two Computational Measures of Semantic Change , 2016, EMNLP.

[52]  Sourav S. Bhowmick,et al.  The Past is Not a Foreign Country: Detecting Semantically Similar Terms across Time , 2016, IEEE Transactions on Knowledge and Data Engineering.

[53]  Karen Spärck Jones A statistical interpretation of term specificity and its application in retrieval , 2021, J. Documentation.

[54]  Francisco Cervantes,et al.  Five-year follow-up of patients receiving imatinib for chronic myeloid leukemia. , 2006, The New England journal of medicine.

[55]  G. Yule The Study of Language 6th Edition , 2016 .

[56]  H Zhao,et al.  Dandelion root extract suppressed gastric cancer cells proliferation and migration through targeting lncRNA-CCAT1. , 2017, Biomedicine & pharmacotherapy = Biomedecine & pharmacotherapie.

[57]  Phase I/II study of S-1 combined with paclitaxel in patients with unresectable and/or recurrent advanced gastric cancer , 2006, British Journal of Cancer.

[58]  Thorsten Joachims,et al.  Evaluation methods for unsupervised word embeddings , 2015, EMNLP.

[59]  S. Lee,et al.  Effect of induced GVHD in leukemia patients relapsing after allogeneic bone marrow transplantation: single-center experience of 33 adult patients , 2001, Bone Marrow Transplantation.

[60]  B. Jenkins,et al.  Inflammasome Adaptor ASC Suppresses Apoptosis of Gastric Cancer Cells by an IL18-Mediated Inflammation-Independent Mechanism. , 2018, Cancer research.

[61]  M. Shahriar Hossain,et al.  A scalable model for tracking topical evolution in large document collections , 2017, 2017 IEEE International Conference on Big Data (Big Data).

[62]  Ben He,et al.  Document Length Normalization , 2009, Encyclopedia of Database Systems.

[63]  Francisco Cervantes,et al.  Imatinib compared with interferon and low-dose cytarabine for newly diagnosed chronic-phase chronic myeloid leukemia. , 2003, The New England journal of medicine.

[64]  Eyal Sagi,et al.  Tracing semantic change with latent semantic analysis , 2011 .

[65]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[66]  Di Chen,et al.  Downregulation of gasdermin D promotes gastric cancer proliferation by regulating cell cycle‐related proteins , 2018, Journal of digestive diseases.

[67]  Slav Petrov,et al.  Temporal Analysis of Language through Neural Language Models , 2014, LTCSS@ACL.