论文信息 - Link analysis ranking

Link analysis ranking

The explosive growth and the widespread accessibility of the Web has led to surge of research activity in the area of information retrieval on the World Wide Web. Ranking has always been an important component of any information retrieval system. In the case of Web search its importance becomes critical. Due to the size of the Web, it is imperative to have ranking functions that capture the user needs. To this end the Web offers a rich context of information which is expressed through the hyperlinks. In this thesis we investigate, theoretically and experimentally, the application of Link Analysis to ranking on the Web. Building upon the framework of hubs and authorities [57], we propose new families of Link Analysis Ranking algorithms. Some of the algorithms we define no longer enjoy the linearity property of the previous algorithms. As a result, it is harder to analyze them, or even prove that they actually converge. However, for a special case of the families we consider, we are able to prove that it will converge, and we provide a complete characterization of the combinatorial properties of the stationary authority weights it produces. The plethora of Link Analysis Ranking algorithms generates the necessity for a formal way to evaluate their properties and compare their behavior. We introduce a theoretical framework for the study of Link Analysis Ranking algorithms, and we define specific properties of the algorithms within this framework. Using these properties we are able to provide an axiomatic characterization of the INDEGREE algorithm that ranks pages according the number of in-coming links. We conclude the thesis with an extensive experimental evaluation of Link Analysis Ranking. We test the algorithms over multiple queries, and we use user feedback to determine their quality. Our experiments reveal some of the limitations of Link Analysis Ranking. Specifically, it appears that for most algorithms, the nodes and the structures in the graph that they favor, do not correspond to the most relevant pages in the collection. These observations offer a new insight into the mechanics of the algorithms, and we believe that they will lead to improved algorithm design, and better input graphs for the algorithms.

Allan Borodin | Panayiotis Tsaparas

[1] Santosh S. Vempala,et al. On clusterings: Good, bad and spectral , 2004, JACM.

[2] Matthew Richardson,et al. The Intelligent surfer: Probabilistic Combination of Link and Content Information in PageRank , 2001, NIPS.

[3] Ben Shneiderman,et al. Structural analysis of hypertexts: identifying hierarchies and useful metrics , 1992, TOIS.

[4] Jon M. Kleinberg,et al. Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[5] Ravi Kumar,et al. Trawling the Web for Emerging Cyber-Communities , 1999, Comput. Networks.

[6] Ronald Fagin,et al. Searching the workplace web , 2003, WWW '03.

[7] Michael I. Jordan,et al. On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[8] Christoph Braun,et al. Coherence of gamma-band EEG activity as a basis for associative learning , 1999, Nature.

[9] Ronald Fagin,et al. Comparing and aggregating rankings with ties , 2004, PODS '04.

[10] R. Devaney. An Introduction to Chaotic Dynamical Systems , 1990 .

[11] Krishna Bharat,et al. When experts agree: using non-affiliated experts to rank popular topics , 2002, ACM Trans. Inf. Syst..