Disclosing Citation Meanings for Augmented Research Retrieval and Exploration

In recent years, new digital technologies are being used to support the navigation and the analysis of scientific publications, justified by the increasing number of articles published every year. For this reason, experts make use of on-line systems to browse thousands of articles in search of relevant information. In this paper, we present a new method that automatically assigns meanings to references on the basis of the citation text through a Natural Language Processing pipeline and a slightly-supervised clustering process. The resulting network of semantically-linked articles allows an informed exploration of the research panorama through semantic paths. The proposed approach has been validated using the ACL Anthology Dataset containing several thousands of papers related to the Computational Linguistics field. A manual evaluation on the extracted citation meanings carried to very high levels of accuracy. Finally, a freely-available web-based application has been developed and published on-line.

[1]  Carlo Strapparava,et al.  Corpus-based and Knowledge-based Measures of Text Semantic Similarity , 2006, AAAI.

[2]  Ludo Waltman,et al.  Citation-based clustering of publications using CitNetExplorer and VOSviewer , 2017, Scientometrics.

[3]  C. Lee Giles,et al.  Clustering and identifying temporal trends in document databases , 2000, Proceedings IEEE Advances in Digital Libraries 2000.

[4]  Silvio Peroni,et al.  FaBiO and CiTO: Ontologies for describing bibliographic resources and citations , 2012, J. Web Semant..

[5]  C. Lee Giles,et al.  ParsCit: an Open-source CRF Reference String Parsing Package , 2008, LREC.

[6]  Prasenjit Mitra,et al.  Utilizing Context in Generative Bayesian Models for Linked Corpus , 2010, AAAI.

[7]  N. K. Nagwani,et al.  Summarizing large text collection using topic modeling and clustering based on MapReduce framework , 2015, Journal of Big Data.

[8]  Andrew McCallum,et al.  Efficient clustering of high-dimensional data sets with application to reference matching , 2000, KDD '00.

[9]  Peter Bergström,et al.  Augmenting the exploration of digital libraries with web-based visualizations , 2009, 2009 Fourth International Conference on Digital Information Management.

[10]  Ludo Waltman,et al.  CitNetExplorer: A new software tool for analyzing and visualizing citation networks , 2014, J. Informetrics.

[11]  Ludo Waltman,et al.  Clustering Scientific Publications Based on Citation Relations: A Systematic Comparison of Different Methods , 2015, PloS one.

[12]  Iryna Gurevych,et al.  Comparative Exploration of Document Collections: a Visual Analytics Approach , 2014, Comput. Graph. Forum.

[13]  Dongwoo Kim,et al.  Joint Modeling of Topics, Citations, and Topical Authority in Academic Corpora , 2017, TACL.

[14]  Peter Mutschke,et al.  Mining Networks and Central Entities in Digital Libraries. A Graph Theoretic Approach Applied to Co-author Networks , 2003, IDA.

[15]  Wolf-Tilo Balke,et al.  Demonstrating the semantic growbag: automatically creating topic facets for faceteddblp , 2007, JCDL '07.

[16]  M E Newman,et al.  Scientific collaboration networks. I. Network construction and fundamental results. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[17]  T. Landauer,et al.  Indexing by Latent Semantic Analysis , 1990 .

[18]  Xiangliang Zhang,et al.  Delve: A Dataset-Driven Scholarly Search and Analysis System , 2017, SKDD.

[19]  Wang-Chien Lee,et al.  CiteSeerx: an architecture and web service design for an academic document search engine , 2006, WWW '06.

[20]  Michael Gleicher,et al.  Serendip: Topic model-driven visual exploration of text corpora , 2014, 2014 IEEE Conference on Visual Analytics Science and Technology (VAST).

[21]  Thomas L. Griffiths,et al.  The Author-Topic Model for Authors and Documents , 2004, UAI.

[22]  Dan Roth,et al.  Citation Author Topic Model in Expert Search , 2010, COLING.

[23]  Mihai Surdeanu,et al.  The Stanford CoreNLP Natural Language Processing Toolkit , 2014, ACL.

[24]  Thomas L. Griffiths,et al.  Learning author-topic models from text corpora , 2010, TOIS.

[25]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[26]  Ludo Waltman,et al.  Vos: A New Method for Visualizing Similarities between Objects , 2006, GfKl.