Analysis of anchor text for web search

It has been observed that anchor text in web documents is very useful in improving the quality of web text search for some classes of queries. By examining properties of anchor text in a large intranet, we hope to shed light on why this is the case. Our main premise is that anchor text behaves very much like real user queries and consensus titles. Thus an understanding of how anchor text is related to a document will likely lead to better understanding of how to translate a user’s query into high quality search results. Our approach is experimental, based on a study of a large corporate intranet, including the content as well as a large stream of queries against that content. We conduct experiments to investigate several aspects of anchor text, including their relationship to titles, the frequency of queries that can be satisfied by anchortext alone, and the homogeneity of results fetched by anchor text.

[1]  Stephen E. Robertson,et al.  Effective site finding using link anchor information , 2001, SIGIR '01.

[2]  Djoerd Hiemstra,et al.  The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.

[3]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[4]  Ronald Fagin,et al.  Searching the workplace web , 2003, WWW '03.

[5]  Adam Kilgarriff,et al.  Measures for Corpus Similarity and Homogeneity , 1998, EMNLP.

[6]  Djoerd Hiemstra,et al.  Retrieving Web Pages Using Content, Links, URLs and Anchors , 2001, TREC.

[7]  Lillian Lee,et al.  Measures of Distributional Similarity , 1999, ACL.

[8]  Rong Jin,et al.  Title language model for information retrieval , 2002, SIGIR '02.

[9]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .

[10]  Jörg Meyer,et al.  Web Query Characteristics and their Implications on Search Engines , 2001, WWW Posters.

[11]  Bernard J. Jansen,et al.  A review of web searching studies and a framework for future research , 2001 .

[12]  Craig Silverstein,et al.  Analysis of a Very Large Altavista Query Log" SRC Technical note #1998-14 , 1998 .

[13]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[14]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[15]  Oliver A. McBryan,et al.  GENVL and WWWW: Tools for taming the Web , 1994, WWW Spring 1994.

[16]  Michael E. Lesk,et al.  Computer Evaluation of Indexing and Text Processing , 1968, JACM.