Topic-driven multi-type citation network analysis

In every scientific field, automated citation analysis enables the estimation of importance or reputation of publications and authors. In this paper, we focus on the task of ranking authors. Although previous work has used content-based approaches or citation network link analyses, the combination of the two with topical link analyses is unexplored. Moreover, previous citation analysis applications are typically limited to a graph based on author citations, or a bipartite graph based on author and paper citations. We present in this paper a novel integrated probabilistic model which combines a content-based approach with a multi-type citation network which integrates citations among papers, authors, affiliations and publishing venues in a single model. We further introduce the application of Topical PageRank into citation network link analysis due to the fact that researchers may be experts in different scientific domains. Finally, we describe a heterogenous link analysis of the citation network, exploring the impact of weighting various factors. Comparative experimental results based on data extracted from the ACM digital library show that 1) the multi-type citation graph works better than citation graphs integrating fewer types of entities, 2) the use of Topical PageRank can further improve performance, and 3) Heterogenous PageRank with parameter tuning can work even better than Topical PageRank.

[1]  Andrew McCallum,et al.  Mining a digital library for influential authors , 2007, JCDL '07.

[2]  Gideon S. Mann,et al.  Bibliometric impact measures leveraging topic analysis , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[3]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[4]  Bernard Rous,et al.  The ACM digital library , 2001, CACM.

[5]  Taher H. Haveliwala Topic-sensitive PageRank , 2002, IEEE Trans. Knowl. Data Eng..

[6]  Wei-Ying Ma,et al.  Object-level ranking: bringing order to Web objects , 2005, WWW '05.

[7]  Antal van den Bosch,et al.  Using Citation Analysis for Finding Experts in Workgroups , 2008 .

[8]  Edward A. Fox,et al.  Link fusion: a unified link analysis framework for multi-type interrelated data objects , 2004, WWW '04.

[9]  Brian D. Davison,et al.  Topical link analysis for web search , 2006, SIGIR.

[10]  Sergei Maslov,et al.  Finding scientific gems with Google's PageRank algorithm , 2006, J. Informetrics.

[11]  Brian D. Davison Toward a unification of text and link analysis , 2003, SIGIR.

[12]  Gabriel Pinski,et al.  Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics , 1976, Inf. Process. Manag..

[13]  Bart De Moor,et al.  Combining full text and bibliometric information in mapping scientific disciplines , 2005, Inf. Process. Manag..

[14]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[15]  Craig MacDonald,et al.  Voting for candidates: adapting data fusion techniques for an expert search task , 2006, CIKM '06.

[16]  M. de Rijke,et al.  Formal models for expert finding in enterprise corpora , 2006, SIGIR.

[17]  Stephen E. Robertson,et al.  Overview of the Okapi projects , 1997, J. Documentation.

[18]  Chun Chen,et al.  Personalized tag recommendation using graph-based ranking on multi-type interrelated objects , 2009, SIGIR.

[19]  Stanley Wasserman,et al.  Social Network Analysis: Methods and Applications , 1994, Structural analysis in the social sciences.

[20]  Hongyuan Zha,et al.  Co-ranking Authors and Documents in a Heterogeneous Network , 2007, Seventh IEEE International Conference on Data Mining (ICDM 2007).

[21]  ChengXiang Zhai,et al.  Probabilistic Models for Expert Finding , 2007, ECIR.

[22]  Johan Bollen,et al.  Co-authorship networks in the digital library research community , 2005, Inf. Process. Manag..

[23]  Zheng Chen,et al.  Latent semantic analysis for multiple-type interrelated data objects , 2006, SIGIR.