A Language-based Approach to Measuring Scholarly Impact

Identifying the most influential documents in a corpus is an important problem in many fields, from information science and historiography to text summarization and news aggregation. Unfortunately, traditional bibliometrics such as citations are often not available. We propose using changes in the thematic content of documents over time to measure the importance of individual documents within the collection. We describe a dynamic topic model for both quantifying and qualifying the impact of these documents. We validate the model by analyzing three large corpora of scientific articles. Our measurement of a document's impact correlates significantly with its number of citations.

[1]  Thorsten Joachims,et al.  Information genealogy: uncovering the flow of ideas in non-hyperlinked document databases , 2007, KDD '07.

[2]  K. A. McKibbon,et al.  Prediction of citation counts for clinical articles at two years using data available within three weeks of publication: retrospective cohort study , 2008, BMJ : British Medical Journal.

[3]  Steffen Bickel,et al.  Unsupervised prediction of citation influences , 2007, ICML '07.

[4]  Dragomir R. Radev,et al.  Scientific Paper Summarization Using Citation Summary Networks , 2008, COLING.

[5]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[6]  Chaomei Chen,et al.  Visualizing knowledge domains , 2005, Annu. Rev. Inf. Sci. Technol..

[7]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[8]  John D. Lafferty,et al.  Dynamic topic models , 2006, ICML.

[9]  Sean M. McNee,et al.  On the recommending of citations for research papers , 2002, CSCW '02.

[10]  Alexander I. Pudovkin,et al.  Algorithmic citation-linked historiography - Mapping the literature of science , 2005, ASIST.

[11]  Concha Bielza,et al.  Predicting citation count of Bioinformatics papers within four years of publication , 2009, Bioinform..

[12]  Beatrice Santorini,et al.  Building a Large Annotated Corpus of English: The Penn Treebank , 1993, CL.

[13]  Robert L. Mercer,et al.  The Mathematics of Statistical Machine Translation: Parameter Estimation , 1993, CL.

[14]  FARIDEH OSAREH,et al.  Bibliometrics, Citation Analysis and Co-Citation Analysis: A Review of Literature I , 1996, Libri.

[15]  Ramesh Nallapati,et al.  Link-PLSA-LDA: A New Unsupervised Model for Topics and Influence of Blogs , 2021, ICWSM.

[16]  David M. Blei,et al.  Relational Topic Models for Document Networks , 2009, AISTATS.

[17]  Michael I. Jordan,et al.  An Introduction to Variational Methods for Graphical Models , 1999, Machine Learning.

[18]  Ramesh Nallapati,et al.  Multiscale topic tomography , 2007, KDD '07.

[19]  R. Rosenfeld Nature , 2009, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[20]  Jie Tang,et al.  A Discriminative Approach to Topic-Based Citation Recommendation , 2009, PAKDD.

[21]  David A. Cohn,et al.  The Missing Link - A Probabilistic Model of Document Content and Hypertext Connectivity , 2000, NIPS.

[22]  Neil D. Lawrence,et al.  Latent Force Models , 2009, AISTATS.

[23]  Nancy L. Wilczynski,et al.  study weeks of publication: retrospective cohort at two years using data available within three Prediction of citation counts for clinical articles , 2008 .

[24]  Alan L. Porter,et al.  Citations and scientific progress: Comparing bibliometric measures with scientist judgments , 1988, Scientometrics.

[25]  Gideon S. Mann,et al.  Bibliometric impact measures leveraging topic analysis , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[26]  Dragomir R. Radev,et al.  Citation Analysis, Centrality, and the ACL Anthology , 2008 .

[27]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[28]  Jure Leskovec,et al.  Meme-tracking and the dynamics of the news cycle , 2009, KDD.