Back to the Future - Sequential Alignment of Text Representations

Language evolves over time in many ways relevant to natural language processing tasks. For example, recent occurrences of tokens 'BERT' and 'ELMO' in publications refer to neural network architectures rather than persons. This type of temporal signal is typically overlooked, but is important if one aims to deploy a machine learning model over an extended period of time. In particular, language evolution causes data drift between time-steps in sequential decision-making tasks. Examples of such tasks include prediction of paper acceptance for yearly conferences (regular intervals) or author stance prediction for rumours on Twitter (irregular intervals). Inspired by successes in computer vision, we tackle data drift by sequentially aligning learned representations. %We consider both unsupervised and semi-supervised alignment. We evaluate on three challenging tasks varying in terms of time-scales, linguistic units, and domains. These tasks show our method outperforming several strong baselines, including using all available data. We argue that, due to its low computational expense, sequential alignment is a practical solution to dealing with language evolution.

[1]  Wouter M. Kouw,et al.  A Review of Domain Adaptation without Target Labels , 2019, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[2]  D. Wijaya,et al.  Understanding semantic change of words over centuries , 2011, DETECT '11.

[3]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[4]  Anders Søgaard,et al.  Sentiment analysis under temporal shift , 2018, WASSA@EMNLP.

[5]  Jing Zhang,et al.  Joint Geometrical and Statistical Alignment for Visual Domain Adaptation , 2017, 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[6]  Charles A. Sutton,et al.  Popularity of arXiv.org within Computer Science , 2017, ArXiv.

[7]  Luke S. Zettlemoyer,et al.  Deep Contextualized Word Representations , 2018, NAACL.

[8]  Chong-Wah Ngo,et al.  Semi-supervised Domain Adaptation with Subspace Learning for visual recognition , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[9]  James Allan,et al.  Temporal Summaries of News Topics , 2019 .

[10]  Bhavana Dalvi,et al.  A Dataset of Peer Reviews (PeerRead): Collection, Insights and NLP Applications , 2018, NAACL.

[11]  Kalina Bontcheva,et al.  Broad Twitter Corpus: A Diverse Named Entity Recognition Resource , 2016, COLING.

[12]  Erez Lieberman Aiden,et al.  Quantitative Analysis of Culture Using Millions of Digitized Books , 2010, Science.

[13]  Isabelle Augenstein,et al.  Turing at SemEval-2017 Task 8: Sequential Approach to Rumour Stance Classification with Branch-LSTM , 2017, *SEMEVAL.

[14]  Hui Xiong,et al.  Dynamic Word Embeddings for Evolving Semantic Discovery , 2017, WSDM.

[15]  M. Cha,et al.  Rumor Detection over Varying Time Windows , 2017, PloS one.

[16]  Lukasz Kaiser,et al.  Attention is All you Need , 2017, NIPS.

[17]  Tinne Tuytelaars,et al.  Unsupervised Visual Domain Adaptation Using Subspace Alignment , 2013, 2013 IEEE International Conference on Computer Vision.

[18]  Arkaitz Zubiaga,et al.  Hawkes Processes for Continuous Time Sequence Classification: an Application to Rumour Stance Classification in Twitter , 2016, ACL.

[19]  James Allan,et al.  Temporal summaries of new topics , 2001, SIGIR '01.

[20]  Arkaitz Zubiaga,et al.  SemEval-2019 Task 7: RumourEval, Determining Rumour Veracity and Support for Rumours , 2019, *SEMEVAL.

[21]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[22]  Marco Baroni,et al.  A distributional similarity approach to the detection of semantic change in the Google Books Ngram corpus. , 2011, GEMS.

[23]  Erik Velldal,et al.  Diachronic word embeddings and semantic shifts: a survey , 2018, COLING.

[24]  Noah A. Smith,et al.  To Tune or Not to Tune? Adapting Pretrained Representations to Diverse Tasks , 2019, RepL4NLP@ACL.

[25]  Rada Mihalcea,et al.  Word Epoch Disambiguation: Finding How Words Change Over Time , 2012, ACL.

[26]  Rémi Emonet,et al.  Landmarks-based kernelized subspace alignment for unsupervised domain adaptation , 2015, 2015 IEEE Conference on Computer Vision and Pattern Recognition (CVPR).

[27]  Jimmy Ba,et al.  Adam: A Method for Stochastic Optimization , 2014, ICLR.

[28]  Christian Biemann,et al.  That’s sick dude!: Automatic identification of word sense change across different timescales , 2014, ACL.

[29]  Gerhard Heyer,et al.  Change of Topics over Time - Tracking Topics by their Change of Meaning , 2009, KDIR.

[30]  Ming-Wei Chang,et al.  BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding , 2019, NAACL.

[31]  Thorsten Joachims,et al.  Temporal corpus summarization using submodular word coverage , 2012, CIKM '12.

[32]  Alec Radford,et al.  Improving Language Understanding by Generative Pre-Training , 2018 .

[33]  Ilya Sutskever,et al.  Language Models are Unsupervised Multitask Learners , 2019 .