Application of entity linking to identify research fronts and trends

Studying research fronts enables researchers to understand how their academic fields emerged, how they are currently developing and their changes over time. While topic modelling tools help discover themes in documents, they employ a “bag-of-words” approach and require researchers to manually label categories, specify the number of topics a priori, and make assumptions about word distributions in documents. This paper proposes an alternative approach based on entity linking, which links word strings to entities from a knowledge base, to help solve issues associated with “bag-of-words” approaches by automatically identifying topics based on entity mentions. To study topic trends and popularity, we use four indicators—Mann–Kendall’s test, Sen’s slope analysis, z -score values and Kleinberg’s burst detection algorithm. The combination of these indicators helps us understand which topics are particularly active (“hot” topics), which are decreasing (“cold” topics or past “bursty” topics) and which are maturely developed. We apply the approach and indicators to the fields of Information Science and Accounting.

[1]  Ganesh Ramakrishnan,et al.  Collective annotation of Wikipedia entities in web text , 2009, KDD.

[2]  Cristiane Chaves Gattaz,et al.  Structure and evolution of innovation research in the last 60 years: review and future trends in the field of business through the citations and co-citations analysis , 2018, Scientometrics.

[3]  Anton J. Nederhof,et al.  Mapping the social and behavioral sciences world-wide: Use of maps in portfolio analysis of national research efforts , 2006, Scientometrics.

[4]  Yves Gingras,et al.  A new approach for detecting scientific specialties from raw cocitation networks , 2009, J. Assoc. Inf. Sci. Technol..

[5]  Denis Borenstein,et al.  Is predatory publishing a real threat? Evidence from a large database study , 2018, Scientometrics.

[6]  Baitong Chen,et al.  Understanding the topic evolution in a scientific domain: An exploratory study for the field of information retrieval , 2017, J. Informetrics.

[7]  J. A. T. Silva,et al.  What Value Do Journal Whitelists and Blacklists Have in Academia? , 2018, The Journal of Academic Librarianship.

[8]  Henry G. Small,et al.  Citation structure of an emerging research area on the verge of application , 2009, Scientometrics.

[9]  Daniel Jurafsky,et al.  Studying the History of Ideas Using Topic Models , 2008, EMNLP.

[10]  Chaomei Chen,et al.  A scientometric review of emerging trends and new developments in recommendation systems , 2015, Scientometrics.

[11]  Martin J. Westgate,et al.  Text analysis tools for identification of emerging topics and research gaps in conservation science , 2015, Conservation biology : the journal of the Society for Conservation Biology.

[12]  Janaki Sinnasamy,et al.  A Correlational Study of Foreign Language Anxiety and Library Anxiety Among Non-native Speakers of English: A Case Study in a Malaysian Public University , 2014 .

[13]  Henry G. Small,et al.  Co-citation in the scientific literature: A new measure of the relationship between two documents , 1973, J. Am. Soc. Inf. Sci..

[14]  Tai-Kuei Yu,et al.  The moderating effect of technology optimism , 2019, Online Inf. Rev..

[15]  K. Becker,et al.  Analysis of microarray data using Z score transformation. , 2003, The Journal of molecular diagnostics : JMD.

[16]  Siegfried Handschuh,et al.  Semantic annotation for knowledge management: Requirements and a survey of the state of the art , 2006, J. Web Semant..

[17]  Chris Moselen,et al.  Integrating Information Literacy into Academic Curricula: A Professional Development Programme for Librarians at the University of Auckland , 2014 .

[18]  Krista M. Soria,et al.  Stacks, Serials, Search Engines, and Students' Success: First-Year Undergraduate Students' Library Use, Academic Achievement, and Retention , 2014 .

[19]  Ali A. Minai,et al.  Semantic knowledge inference from online news media using an LDA-NLP approach , 2011, The 2011 International Joint Conference on Neural Networks.

[20]  Xiang Liu,et al.  Collective dynamics in knowledge networks: Emerging trends analysis , 2013, J. Informetrics.

[21]  Huimin Yu,et al.  A survey on trends of cross-media topic evolution map , 2017, Knowl. Based Syst..

[22]  Erik F. Tjong Kim Sang,et al.  Introduction to the CoNLL-2003 Shared Task: Language-Independent Named Entity Recognition , 2003, CoNLL.

[23]  Jia Tina Du,et al.  Understanding link sharing tools continuance behavior in social media , 2017, Online Inf. Rev..

[24]  Dave Zwicky,et al.  Effective Engineering Information Literacy Instruction: A Systematic Literature Review , 2018, The Journal of Academic Librarianship.

[25]  Gene E. Likens,et al.  Rising stream and river temperatures in the United States , 2010 .

[26]  H. B. Mann Nonparametric Tests Against Trend , 1945 .

[27]  J. Corte-Real,et al.  Rainfall and river flow trends using Mann–Kendall and Sen’s slope estimator statistical tests in the Cobres River basin , 2015, Natural Hazards.

[28]  Satoshi Sekine,et al.  A survey of named entity recognition and classification , 2007 .

[29]  K. Fujita,et al.  Detecting research fronts using different types of weighted citation networks , 2012, 2012 Proceedings of PICMET '12: Technology Management for Emerging Technologies.

[30]  Guohe Huang,et al.  Heterogeneous Precipitation and Streamflow Trends in the Xiangxi River Watershed, 1961-2010 , 2014 .

[31]  Selcuk Besir Demir,et al.  Predatory journals: Who publishes in them and why? , 2018, J. Informetrics.

[32]  Bo Jarneving,et al.  Bibliographic coupling and its application to research-front and other core documents , 2007, J. Informetrics.

[33]  Jean-François Rouet,et al.  Multiple viewpoints increase students' attention to source features in social question and answer forum messages , 2016, J. Assoc. Inf. Sci. Technol..

[34]  William M. Pottenger,et al.  A framework for understanding Latent Semantic Indexing (LSI) performance , 2006, Inf. Process. Manag..

[35]  Bradley J Erickson,et al.  The effects of changes in utilization and technological advancements of cross-sectional imaging on radiologist workload. , 2015, Academic radiology.

[36]  Marcy Mintz,et al.  An approach to conference selection and evaluation: advice to avoid “predatory” conferences , 2018, Scientometrics.

[37]  LeEtta M. Schmidt,et al.  Copyright Instruction in LIS Programs: Report of a Survey of Standards in the U.S.A. , 2015 .

[38]  Bakthavachalam Elango,et al.  Detecting the historical roots of tribology research: a bibliometric analysis , 2016, Scientometrics.

[39]  ZhouPing,et al.  Academic publishing and collaboration between China and Germany in physics , 2015 .

[40]  Kevin W. Boyack,et al.  Creation of a highly detailed, dynamic, global model and map of science , 2014, J. Assoc. Inf. Sci. Technol..

[41]  Jiancheng Guan,et al.  A bibliometric investigation of research performance in emerging nanobiopharmaceuticals , 2011, J. Informetrics.

[42]  Jan W. Buzydlowski,et al.  Term Co-occurrence Analysis as an Interface for Digital Libraries , 2002, Visual Interfaces to Digital Libraries.

[43]  Matthew Michelson,et al.  Tweet Disambiguate Entities Retrieve Folksonomy SubTree Step 1 : Discover Categories Generate Topic Profile from SubTrees Step 2 : Discover Profile Topic Profile : “ English Football ” “ World Cup ” , 2010 .

[44]  Vahid Garousi,et al.  Citations, research topics and active countries in software engineering: A bibliometrics study , 2016, Comput. Sci. Rev..

[45]  M. Gocić,et al.  Analysis of changes in meteorological variables using Mann-Kendall and Sen's slope estimator statistical tests in Serbia , 2013 .

[46]  Pilsung Kang,et al.  Identifying core topics in technology and innovation management studies: a topic model approach , 2018 .

[47]  Henry G. Small,et al.  Tracking and predicting growth areas in science , 2006, Scientometrics.

[48]  Alba Santa Soriano,et al.  Bibliometric analysis to identify an emerging research area: Public Relations Intelligence—a challenge to strengthen technological observatories in the network society , 2018, Scientometrics.

[49]  Sarah Young,et al.  Using Practitioner-engaged Evidence Synthesis to Teach Research and Information Literacy Skills: A Model and Case Study , 2018 .

[50]  Massimiliano Ciaramita,et al.  A framework for benchmarking entity-annotation systems , 2013, WWW.

[51]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[52]  Paolo Ferragina,et al.  TAGME: on-the-fly annotation of short text fragments (by wikipedia entities) , 2010, CIKM.

[53]  V. R. Leeworthy,et al.  Long‐term trends in the recreational lobster fishery of Florida, United States: Landings, effort, and implications for management , 2005 .

[54]  Vincent Ng,et al.  Automatic Keyphrase Extraction: A Survey of the State of the Art , 2014, ACL.

[55]  Rupert J. Baumgartner,et al.  Identifying dominant topics appearing in the Journal of Cleaner Production , 2018, Journal of Cleaner Production.

[56]  Henk F. Moed,et al.  Mapping of science by combined co-citation and word analysis. II: Dynamical aspects , 1991 .

[57]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[58]  Ed C. M. Noyons,et al.  Automatic term identification for bibliometric mapping , 2008, Scientometrics.

[59]  Ying Ding,et al.  Data-driven Discovery: A New Era of Exploiting the Literature and Data , 2016, J. Data Inf. Sci..

[60]  Myra Spiliopoulou,et al.  Topic Evolution in a Stream of Documents , 2009, SDM.

[61]  Milan Stojković,et al.  Multi-Temporal Analysis of Mean Annual and Seasonal Stream Flow Trends, Including Periodicity and Multiple Non-Linear Regression , 2014, Water Resources Management.

[62]  Chaomei Chen,et al.  CiteSpace II: Detecting and visualizing emerging trends and transient patterns in scientific literature , 2006, J. Assoc. Inf. Sci. Technol..

[63]  Valentin Jijkoun,et al.  The Impact of Named Entity Normalization on Information Retrieval for Question Answering , 2008, ECIR.

[64]  David F Kallmes,et al.  Effects of author contribution disclosures and numeric limitations on authorship trends. , 2010, Mayo Clinic proceedings.

[65]  Xiuzhen Zhang,et al.  A probabilistic method for emerging topic tracking in Microblog stream , 2016, World Wide Web.

[66]  David B. Dunson,et al.  Probabilistic topic models , 2011, KDD '11 Tutorials.

[67]  Sarjoun Doumit,et al.  News Media Bias Analysis using an LDA-NLP Approach , 2011 .

[68]  Howard D. White,et al.  Author cocitation: A literature measure of intellectual structure , 1981, J. Am. Soc. Inf. Sci..

[69]  D J PRICE,et al.  NETWORKS OF SCIENTIFIC PAPERS. , 1965, Science.

[70]  Mauricio Marrone,et al.  Relevant Research Areas in IT Service Management: An Examination of Academic and Practitioner Literatures , 2017, Commun. Assoc. Inf. Syst..

[71]  Khaled H. Hamed,et al.  A modified Mann-Kendall trend test for autocorrelated data , 1998 .

[72]  Neda Zdravkovic,et al.  A Collaborative Approach to Integrating Information and Academic Literacy into the Curricula of Research Methods Courses , 2016 .

[73]  Dragan Gasevic,et al.  Evolutionary fine-tuning of automated semantic annotation systems , 2015, Expert Syst. Appl..

[74]  P. Sen Estimates of the Regression Coefficient Based on Kendall's Tau , 1968 .

[75]  Eugene Agichtein,et al.  TM-LDA: efficient online modeling of latent topic transitions in social media , 2012, KDD.

[76]  Ahmed E. Hassan,et al.  Studying software evolution using topic models , 2014, Sci. Comput. Program..

[77]  Sheng Yue,et al.  The influence of autocorrelation on the ability to detect trend in hydrological series , 2002 .

[78]  B. C. Griffith,et al.  The Structure of Scientific Literatures II: Toward a Macro- and Microstructure for Science , 1974 .

[79]  A. Fernández,et al.  Proximity dimensions and scientific collaboration among academic institutions in Europe: The closer, the better? , 2016, Scientometrics.

[80]  Merve Bayramusta,et al.  A fad or future of IT?: A comprehensive literature review on the cloud computing research , 2016, Int. J. Inf. Manag..

[81]  D. Cox,et al.  SOME QUICK SIGN TESTS FOR TREND IN LOCATION AND DISPERSION , 1955 .

[82]  H. Small,et al.  Identifying emerging topics in science and technology , 2014 .

[83]  Xixi Lu,et al.  Hydrological responses to precipitation variation and diverse human activities in a mountainous tributary of the lower Xijiang, China , 2009 .

[84]  B. C. Griffith,et al.  The Structure of Scientific Literatures I: Identifying and Graphing Specialties , 1974 .

[85]  Ludmila E. Ivancheva,et al.  Scientometrics Today: A Methodological Overview , 2008 .

[86]  Ping Zhou,et al.  Academic publishing and collaboration between China and Germany in physics , 2015, Scientometrics.

[87]  Gregory W. Corder,et al.  Nonparametric Statistics : A Step-by-Step Approach , 2014 .

[88]  Bart Selman,et al.  Tracking evolving communities in large linked networks , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[89]  K. Walshe Pseudoinnovation: the development and spread of healthcare quality improvement methodologies. , 2009, International journal for quality in health care : journal of the International Society for Quality in Health Care.