Scientific impact at the topic level: A case study in computational linguistics

In this article, we propose to apply the topic model and topic-level eigenfactor (TEF) algorithm to assess the relative importance of academic entities including articles, authors, journals, and conferences. Scientific impact is measured by the biased PageRank score toward topics created by the latent topic model. The TEF metric considers the impact of an academic entity in multiple granular views as well as in a global view. Experiments on a computational linguistics corpus show that the method is a useful and promising measure to assess scientific impact.

[1]  R. Rousseau,et al.  Reflections on recent developments of the h-index and h-type indices , 2008 .

[2]  Ricardo Arencibia-Jorge,et al.  Comparison of SCImago journal rank indicator with journal impact factor , 2008, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[3]  Carl T. Bergstrom,et al.  The Eigenfactor™ Metrics , 2008, The Journal of Neuroscience.

[4]  Sergei Maslov,et al.  Ranking scientific publications using a model of network traffic , 2006, ArXiv.

[5]  Loet Leydesdorff,et al.  How are new citation-based journal indicators adding to the bibliometric toolbox? , 2009, J. Assoc. Inf. Sci. Technol..

[6]  Johan Bollen,et al.  A Principal Component Analysis of 39 Scientific Impact Measures , 2009, PloS one.

[7]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[8]  Massimo Franceschet,et al.  Ten good reasons to use the EigenfactorTM metrics , 2010, Inf. Process. Manag..

[9]  Sergei Maslov,et al.  Optimal ranking in networks with community structure , 2005 .

[10]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[11]  Boleslaw K. Szymanski,et al.  Selecting Scientific Papers for Publication via Citation Auctions , 2007, IEEE Intelligent Systems.

[12]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[13]  Pavel Berkhin,et al.  A Survey on PageRank Computing , 2005, Internet Math..

[14]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[15]  Johan Bollen,et al.  Co-authorship networks in the digital library research community , 2005, Inf. Process. Manag..

[16]  Yi Zhao,et al.  Bringing PageRank to the citation analysis , 2008, Inf. Process. Manag..

[17]  Robin K. S. Hankin,et al.  Beyond the Durfee square: Enhancing the h-index to score total publication output , 2008, Scientometrics.

[18]  Alan Fersht,et al.  The most influential journals: Impact Factor and Eigenfactor , 2009, Proceedings of the National Academy of Sciences.

[19]  Hans-Dieter Daniel,et al.  Data sources for performing citation analysis: an overview , 2008, J. Documentation.

[20]  Michael I. Jordan,et al.  Latent Dirichlet Allocation , 2001, J. Mach. Learn. Res..

[21]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[22]  Ying Ding,et al.  Applying centrality measures to impact analysis: A coauthorship network analysis , 2009, J. Assoc. Inf. Sci. Technol..

[23]  Ruoming Jin,et al.  A Topic Modeling Approach and Its Integration into the Random Walk Framework for Academic Search , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[24]  David J. Newman,et al.  Probabilistic topic decomposition of an eighteenth-century American newspaper , 2006, J. Assoc. Inf. Sci. Technol..

[25]  Thomas Hofmann,et al.  Probabilistic latent semantic indexing , 1999, SIGIR '99.

[26]  Yannis Manolopoulos,et al.  Generalized Hirsch h-index for disclosing latent facts in citation networks , 2007, Scientometrics.

[27]  Juan E. Iglesias,et al.  Scaling the h-index for different scientific ISI fields , 2006, Scientometrics.

[28]  Peter Y. Chen,et al.  Correlation: Parametric and Nonparametric Measures , 2002 .

[29]  Dragomir R. Radev,et al.  The ACL Anthology Reference Corpus: A Reference Dataset for Bibliographic Research in Computational Linguistics , 2008, LREC.

[30]  Taher H. Haveliwala Topic-Sensitive PageRank: A Context-Sensitive Ranking Algorithm for Web Search , 2003, IEEE Trans. Knowl. Data Eng..

[31]  Clive Baldock,et al.  Point/counterpoint. The h index is the best measure of a scientist's research productivity. , 2009, Medical physics.

[32]  Max Welling,et al.  Fast collapsed gibbs sampling for latent dirichlet allocation , 2008, KDD.

[33]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[34]  Dag W. Aksnes,et al.  A macro study of self-citation , 2003, Scientometrics.

[35]  Akiko Aizawa,et al.  An information-theoretic perspective of tf-idf measures , 2003, Inf. Process. Manag..

[36]  Mark Steyvers,et al.  Finding scientific topics , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[37]  Sergei Maslov,et al.  Finding scientific gems with Google's PageRank algorithm , 2006, J. Informetrics.

[38]  Stephen E. Robertson,et al.  A probabilistic model of information retrieval: development and comparative experiments - Part 2 , 2000, Inf. Process. Manag..

[39]  Dragomir R. Radev,et al.  Blind men and elephants: What do citation summaries tell us about a research article? , 2008 .