Exploiting heterogeneous scientific literature networks to combat ranking bias: Evidence from the computational linguistics area

It is important to help researchers find valuable papers from a large literature collection. To this end, many graph‐based ranking algorithms have been proposed. However, most of these algorithms suffer from the problem of ranking bias. Ranking bias hurts the usefulness of a ranking algorithm because it returns a ranking list with an undesirable time distribution. This paper is a focused study on how to alleviate ranking bias by leveraging the heterogeneous network structure of the literature collection. We propose a new graph‐based ranking algorithm, MutualRank, that integrates mutual reinforcement relationships among networks of papers, researchers, and venues to achieve a more synthetic, accurate, and less‐biased ranking than previous methods. MutualRank provides a unified model that involves both intra‐ and inter‐network information for ranking papers, researchers, and venues simultaneously. We use the ACL Anthology Network as the benchmark data set and construct the gold standard from computer linguistics course websites of well‐known universities and two well‐known textbooks. The experimental results show that MutualRank greatly outperforms the state‐of‐the‐art competitors, including PageRank, HITS, CoRank, Future Rank, and P‐Rank, in ranking papers in both improving ranking effectiveness and alleviating ranking bias. Rankings of researchers and venues by MutualRank are also quite reasonable.

[1]  Philip S. Yu,et al.  HeteRecom: a semantic-based recommendation system in heterogeneous networks , 2012, KDD.

[2]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[3]  Riyaz Sikora,et al.  Assessing the relative influence of journals in a citation network , 2005, CACM.

[4]  Bing Liu,et al.  Web Data Mining: Exploring Hyperlinks, Contents, and Usage Data , 2006, Data-Centric Systems and Applications.

[5]  Z. K. Silagadze,et al.  Citation entropy and research impact estimation , 2009, ArXiv.

[6]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[7]  Qi Hong,et al.  The Paper Value Prediction Algorithm Based on the Author's Authority Value , 2012 .

[8]  MoffatAlistair,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008 .

[9]  Andreas Strotmann,et al.  Author name disambiguation: What difference does it make in author-based citation analysis? , 2012, J. Assoc. Inf. Sci. Technol..

[10]  Hai Zhuge,et al.  Graph-based algorithms for ranking researchers: not all swans are white! , 2012, Scientometrics.

[11]  Yunming Ye,et al.  MultiRank: co-ranking for objects and relations in multi-relational data , 2011, KDD.

[12]  James Caverlee,et al.  PageRank for ranking authors in co-citation networks , 2009, J. Assoc. Inf. Sci. Technol..

[13]  Ying Ding,et al.  Applying centrality measures to impact analysis: A coauthorship network analysis , 2009, J. Assoc. Inf. Sci. Technol..

[14]  Paul Buitelaar,et al.  Benchmarking domain-specific expert search using workshop program committees , 2013, CompSci '13.

[15]  J. E. Hirsch,et al.  An index to quantify an individual's scientific research output , 2005, Proc. Natl. Acad. Sci. USA.

[16]  Blaise Cronin,et al.  The citation process: The role and significance of citations in scientific communication , 1984 .

[17]  Kamalika Basu Hajra,et al.  Aging in citation networks , 2004, cond-mat/0409017.

[18]  Philip S. Yu,et al.  Time Sensitive Ranking with Application to Publication Search , 2008, 2008 Eighth IEEE International Conference on Data Mining.

[19]  Wahiba Bahsoun,et al.  BibRank: a language-based model for co-ranking entities in bibliographic networks , 2012, JCDL '12.

[20]  Stephen E. Robertson,et al.  Extending average precision to graded relevance judgments , 2010, SIGIR.

[21]  Santo Fortunato,et al.  Characterizing and Modeling Citation Dynamics , 2011, PloS one.

[22]  C. Lee Giles,et al.  Ranking experts using author-document-topic graphs , 2013, JCDL '13.

[23]  Josep Domingo-Ferrer,et al.  A bibliometric index based on the collaboration distance between cited and citing authors , 2011, J. Informetrics.

[24]  Chris H. Q. Ding,et al.  PageRank, HITS and a unified framework for link analysis , 2002, SIGIR '02.

[25]  Dalibor Fiala,et al.  Time-aware PageRank for bibliographic networks , 2012, J. Informetrics.

[26]  Christian S. Jensen,et al.  Mining significant semantic locations from GPS data , 2010, Proc. VLDB Endow..

[27]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[28]  Erjia Yan,et al.  The effects of dangling nodes on citation networks , 2011 .

[29]  Ming Zeng,et al.  Ranking Scientific Articles by Exploiting Citations, Authors, Journals, and Time Information , 2013, AAAI.

[30]  Cassidy R. Sugimoto,et al.  P-Rank: An indicator measuring prestige in heterogeneous scholarly networks , 2011, J. Assoc. Inf. Sci. Technol..

[31]  Lise Getoor,et al.  FutureRank: Ranking Scientific Articles by Predicting their Future PageRank , 2009, SDM.

[32]  Johan Bollen,et al.  Journal status , 2006, Scientometrics.

[33]  Dragomir R. Radev,et al.  The ACL anthology network corpus , 2009, Language Resources and Evaluation.

[34]  Santo Fortunato,et al.  Diffusion of scientific credits and the ranking of scientists , 2009, Physical review. E, Statistical, nonlinear, and soft matter physics.

[35]  Alistair Moffat,et al.  Rank-biased precision for measurement of retrieval effectiveness , 2008, TOIS.

[36]  Ying Ding,et al.  Popular and/or prestigious? Measures of scholarly esteem , 2010, Inf. Process. Manag..

[37]  Gabriel Pinski,et al.  Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics , 1976, Inf. Process. Manag..

[38]  Christopher D. Manning,et al.  Introduction to Information Retrieval , 2010, J. Assoc. Inf. Sci. Technol..

[39]  Soongoo Hong,et al.  Objective quality ranking of computing journals , 2003, CACM.

[40]  Jaana Kekäläinen,et al.  Cumulated gain-based evaluation of IR techniques , 2002, TOIS.

[41]  Brian D. Davison,et al.  Topic-driven multi-type citation network analysis , 2010, RIAO.

[42]  Sergei Maslov,et al.  Finding scientific gems with Google's PageRank algorithm , 2006, J. Informetrics.

[43]  Hai Zhuge,et al.  Topological centrality and its e-Science applications , 2010, J. Assoc. Inf. Sci. Technol..

[44]  Jochen Geiger,et al.  Applied Stochastic Processes , 2007 .

[45]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[46]  Philip S. Yu,et al.  Meta path-based collective classification in heterogeneous information networks , 2012, CIKM.

[47]  M E J Newman Assortative mixing in networks. , 2002, Physical review letters.

[48]  Albert,et al.  Emergence of scaling in random networks , 1999, Science.

[49]  Jie Tang,et al.  ArnetMiner: extraction and mining of academic social networks , 2008, KDD.

[50]  Hongyuan Zha,et al.  Co-ranking Authors and Documents in a Heterogeneous Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[51]  C. Lee Giles,et al.  Ranking authors in digital libraries , 2011, JCDL '11.

[52]  Olfa Nasraoui,et al.  Web data mining: exploring hyperlinks, contents, and usage data , 2008, SKDD.

[53]  Dongwon Lee,et al.  Toward alternative measures for ranking venues: a case of database research community , 2007, JCDL '07.

[54]  L. Egghe,et al.  Theory and practise of the g-index , 2006, Scientometrics.

[55]  Philip S. Yu,et al.  PathSim , 2011, Proc. VLDB Endow..

[56]  Sergei Maslov,et al.  Ranking scientific publications using a model of network traffic , 2006, ArXiv.