TRENDNERT: A Benchmark for Trend and Downtrend Detection in a Scientific Domain

Computational analysis and modeling of the evolution of trends is an important area of research in Natural Language Processing (NLP) because of its socio-economic impact. However, no large publicly available benchmark for trend detection currently exists, making a comparative evaluation of methods impossible. We remedy this situation by publishing the benchmark TRENDNERT, consisting of a set of gold trends and downtrends and document labels that is available as an unrestricted download, and a large underlying document collection that can also be obtained for free. We propose Mean Average Precision (MAP) as an evaluation measure for trend detection and apply this measure in an investigation of several baselines.

[1]  J. R. Landis,et al.  The measurement of observer agreement for categorical data. , 1977, Biometrics.

[2]  J. MacQueen Some methods for classification and analysis of multivariate observations , 1967 .

[3]  Gabriela Vulcu,et al.  Forecasting Emerging Trends from Scientific Literature , 2016, LREC.

[4]  Doug Downey,et al.  Construction of the Literature Graph in Semantic Scholar , 2018, NAACL.

[5]  David M. Blei,et al.  Topic Modeling in Embedding Spaces , 2019, Transactions of the Association for Computational Linguistics.

[6]  Pengtao Xie,et al.  Integrating Document Clustering and Topic Modeling , 2013, UAI.

[7]  Linear trend analysis: a comparison of methods , 2002 .

[8]  Klaus Krippendorff,et al.  Computing Krippendorff's Alpha-Reliability , 2011 .

[9]  Scott W. Linderman,et al.  Poisson-Randomized Gamma Dynamical Systems , 2019, NeurIPS.

[10]  Yi-Ning Tu,et al.  Indices of novelty for emerging topic detection , 2012, Inf. Process. Manag..

[11]  Yoshiteru Nakamori,et al.  Detecting Emerging Trends from Scientific Corpora , 2006 .

[12]  Christopher E. Moody,et al.  Mixing Dirichlet Topic Models and Word Embeddings to Make lda2vec , 2016, ArXiv.

[13]  Yoshiyuki Takeda,et al.  Detecting emerging research fronts based on topological measures in citation networks of scientific publications , 2008 .

[14]  H. Small,et al.  Identifying emerging topics in science and technology , 2014 .

[15]  Xiaoli Li,et al.  EMNLP versus ACL: Analyzing NLP research over time , 2015, EMNLP.

[16]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[17]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[18]  Daniele Rotolo,et al.  Emerging Technology , 2001 .

[19]  Andrew McCallum,et al.  Topics over time: a non-Markov continuous-time model of topical trends , 2006, KDD '06.

[20]  Paul J. Kennedy,et al.  An evaluation of document clustering and topic modelling in two online social networks: Twitter and Reddit , 2020, Inf. Process. Manag..

[21]  Graeme Hirst,et al.  Annotating Anaphoric Shell Nouns with their Antecedents , 2013, LAW@ACL.

[22]  Hinrich Schütze,et al.  Deep Temporal-Recurrent-Replicated-Softmax for Topical Trends over Time , 2017, NAACL.

[23]  P. Buitelaar,et al.  Exploring Your Research : Sprinkling some Saffron on Semantic Web Dog Food , 2010 .

[24]  Henry G. Small,et al.  Tracking and predicting growth areas in science , 2006, Scientometrics.

[25]  Jeffrey Dean,et al.  Efficient Estimation of Word Representations in Vector Space , 2013, ICLR.

[26]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[27]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.

[28]  William M. Pottenger,et al.  A Survey of Emerging Trend Detection in Textual Data Mining , 2004 .

[29]  Shouqian Sun,et al.  Using Text Mining Techniques to Identify Research Trends: A Case Study of Design Research , 2017 .

[30]  Chaomei Chen,et al.  Predictive Effects of Novelty Measured by Temporal Embeddings on the Growth of Scientific Literature , 2018, Front. Res. Metr. Anal..

[31]  Naoki Shibata,et al.  Comparative study on methods of detecting research fronts using different types of citation , 2009, J. Assoc. Inf. Sci. Technol..

[32]  Quoc V. Le,et al.  Distributed Representations of Sentences and Documents , 2014, ICML.