Ontology Forecasting in Scientific Literature: Semantic Concepts Prediction Based on Innovation-Adoption Priors

The ontology engineering research community has focused for many years on supporting the creation, development and evolution of ontologies. Ontology forecasting, which aims at predicting semantic changes in an ontology, represents instead a new challenge. In this paper, we want to give a contribution to this novel endeavour by focusing on the task of forecasting semantic concepts in the research domain. Indeed, ontologies representing scientific disciplines contain only research topics that are already popular enough to be selected by human experts or automatic algorithms. They are thus unfit to support tasks which require the ability of describing and exploring the forefront of research, such as trend detection and horizon scanning. We address this issue by introducing the Semantic Innovation Forecast (SIF) model, which predicts new concepts of an ontology at time t + 1, using only data available at time t. Our approach relies on lexical innovation and adoption information extracted from historical data. We evaluated the SIF model on a very large dataset consisting of over one million scientific papers belonging to the Computer Science domain: the outcomes show that the proposed approach offers a competitive boost in mean average precision-at-ten compared to the baselines when forecasting over 5 years.

[1]  Hao Wang,et al.  Analysis and Prediction of User Editing Patterns in Ontology Development Projects , 2014, Journal on Data Semantics.

[2]  Rudi Studer,et al.  TRM - Learning Dependencies between Text and Structure with Topical Relational Models , 2013, SEMWEB.

[3]  ChengXiang Zhai,et al.  Discovering evolutionary theme patterns from text: an exploration of temporal text mining , 2005, KDD '05.

[4]  Satoshi Morinaga,et al.  Tracking dynamics of topic trends using a finite mixture model , 2004, KDD.

[5]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[6]  Slava M. Katz,et al.  Estimation of probabilities from sparse data for the language model component of a speech recognizer , 1987, IEEE Trans. Acoust. Speech Signal Process..

[7]  Enrico Motta,et al.  Ontology evolution: a process-centric survey , 2013, The Knowledge Engineering Review.

[8]  Razvan C. Bunescu,et al.  Using Encyclopedic Knowledge for Named entity Disambiguation , 2006, EACL.

[9]  Stanley F. Chen,et al.  Evaluation Metrics For Language Models , 1998 .

[10]  P. Buitelaar,et al.  Exploring Your Research : Sprinkling some Saffron on Semantic Web Dog Food , 2010 .

[11]  Yuen-Hsien Tseng,et al.  A comparison of methods for detecting hot topics , 2009, Scientometrics.

[12]  Jian Pei,et al.  Detecting topic evolution in scientific literature: how can citations help? , 2009, CIKM.

[13]  T. Minka Estimating a Dirichlet distribution , 2012 .

[14]  Catia Pesquita,et al.  Predicting the Extension of Biomedical Ontologies , 2012, PLoS Comput. Biol..

[15]  Myra Spiliopoulou,et al.  Topic Evolution in a Stream of Documents , 2009, SDM.

[16]  Peter Willett,et al.  The Porter stemming algorithm: then and now , 2006, Program.

[17]  Enrico Motta,et al.  Automatic Classification of Springer Nature Proceedings with Smart Topic Miner , 2016, SEMWEB.

[18]  Jure Leskovec,et al.  No country for old members: user lifecycle and linguistic change in online communities , 2013, WWW.

[19]  C. Lee Giles,et al.  Topic and Trend Detection in Text Collections Using Latent Dirichlet Allocation , 2009, ECIR.

[20]  Bo Zhao,et al.  Probabilistic topic models with biased propagation on heterogeneous information networks , 2011, KDD.

[21]  Eric P. Xing,et al.  Timeline: A Dynamic Hierarchical Dirichlet Process Model for Recovering Birth/Death and Evolution of Topics in Text Stream , 2010, UAI.

[22]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[23]  Enrico Motta,et al.  Klink-2: Integrating Multiple Web Sources to Generate Semantic Topic Networks , 2015, SEMWEB.

[24]  Enrico Motta,et al.  Exploring Scholarly Data with Rexplore , 2013, International Semantic Web Conference.

[25]  C. Lee Giles,et al.  Finding topic trends in digital libraries , 2009, JCDL '09.

[26]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[27]  Xiaojin Zhu,et al.  Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence A Framework for Incorporating General Domain Knowledge into Latent Dirichlet Allocation Using First-Order Logic , 2022 .

[28]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.