Link fusion: a unified link analysis framework for multi-type interrelated data objects

Web link analysis has proven to be a significant enhancement for quality based web search. Most existing links can be classified into two categories: intra-type links (e.g., web hyperlinks), which represent the relationship of data objects within a homogeneous data type (web pages), and inter-type links (e.g., user browsing log) which represent the relationship of data objects across different data types (users and web pages). Unfortunately, most link analysis research only considers one type of link. In this paper, we propose a unified link analysis framework, called "link fusion", which considers both the inter- and intra- type link structure among multiple-type inter-related data objects and brings order to objects in each data type at the same time. The PageRank and HITS algorithms are shown to be special cases of our unified link analysis framework. Experiments on an instantiation of the framework that makes use of the user data and web pages extracted from a proxy log show that our proposed algorithm could improve the search effectiveness over the HITS and DirectHit algorithms by 24.6% and 38.2% respectively.

[1]  Shlomo Moran,et al.  SALSA: the stochastic approach for link-structure analysis , 2001, TOIS.

[2]  Michael I. Jordan,et al.  Stable algorithms for link analysis , 2001, SIGIR '01.

[3]  Jon M. Kleinberg,et al.  Mining the Web's Link Structure , 1999, Computer.

[4]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[5]  Ji-Rong Wen,et al.  Query clustering using user logs , 2002, TOIS.

[6]  E. Garfield Citation analysis as a tool in journal evaluation. , 1972, Science.

[7]  Joel C. Miller,et al.  Modifications of Kleinberg's HITS algorithm using matrix exponentiation and web log records , 2001, SIGIR '01.

[8]  Brian D. Davison Toward a unification of text and link analysis , 2003, SIGIR.

[9]  Charles H. Hubbell An Input-Output Approach to Clique Identification , 1965 .

[10]  Gabriel Pinski,et al.  Citation influence for journal aggregates of scientific publications: Theory, with application to the literature of physics , 1976, Inf. Process. Manag..

[11]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[12]  John Riedl,et al.  An algorithmic framework for performing collaborative filtering , 1999, SIGIR '99.

[13]  Robert J. Plemmons,et al.  Nonnegative Matrices in the Mathematical Sciences , 1979, Classics in Applied Mathematics.

[14]  David Cohn,et al.  Learning to Probabilistically Identify Authoritative Documents , 2000, ICML.

[15]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[16]  David Hawking,et al.  Overview of the TREC-2002 Web Track , 2002, TREC.

[17]  Leo Katz,et al.  A new status index derived from sociometric analysis , 1953 .

[18]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[19]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.