Combining Multiple Sources of Evidence to Enhance Web Search Performance

The Web is rich with various sources of information that go beyond the contents of documents, such as hyperlinks and manually classified directories of Web documents such as Yahoo. This research extends past fusion IR studies, which have repeatedly shown that combining multiple sources of evidence (i.e. fusion) can improve retrieval performance, by investigating the effects of combining three distinct retrieval approaches for Web IR: the text-based approach that leverages document texts, the link-based approach that leverages hyperlinks, and the classification-based approach that leverages Yahoo categories. Retrieval results of text-, link-, and classification-based methods were combined using variations of the linear combination formula to produce fusion results, which were compared to individual retrieval results using traditional retrieval evaluation metrics. Fusion results were also examined to ascertain the significance of overlap (i.e. the number of systems that retrieve a document) in fusion. The analysis of results suggests that the solution spaces of text-, link-, and classification-based retrieval methods are diverse enough for fusion to be beneficial while revealing important characteristics of the fusion environment, such as effects of system parameters and relationship between overlap, document ranking and relevance.

[1]  J. Lee Combining Multiple Evidence from Different Relevance Feedback Met hods , 2000 .

[2]  Yiyu Yao,et al.  Evaluation of an adaptive linear model , 1991, J. Am. Soc. Inf. Sci..

[3]  William M. Shaw,et al.  Interactive Retrieval using IRIS: TREC-6 Experiments , 1997, TREC.

[4]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[5]  Martha E. Williams Analysis of Terminology in Various CAS Data Files as Access Points for Retrieval , 1977, J. Chem. Inf. Comput. Sci..

[6]  Garrison W. Cottrell,et al.  Predicting the performance of linearly combined IR systems , 1998, SIGIR '98.

[7]  Chris Buckley,et al.  Using Query Zoning and Correlation Within SMART: TREC 5 , 1996, TREC.

[8]  Alan F. Smeaton,et al.  Dublin City University Experiments in Connectivity Analysis for TREC-9 , 2000, TREC.

[9]  Chris Buckley,et al.  Pivoted Document Length Normalization , 1996, SIGIR Forum.

[10]  Nicholas J. Belkin,et al.  The effect multiple query representations on information retrieval system performance , 1993, SIGIR.

[11]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[12]  Koichi Takeda,et al.  Information retrieval on the web , 2000, CSUR.

[13]  R. M. Adelson,et al.  Utility Theory for Decision Making , 1971 .

[14]  Edward A. Fox,et al.  Combination of Multiple Searches , 1993, TREC.

[15]  Garrison W. Cottrell,et al.  Automatic combination of multiple ranked retrieval systems , 1994, SIGIR '94.

[16]  James Allan,et al.  Automatic Query Expansion Using SMART: TREC 3 , 1994, TREC.

[17]  김정현 유럽 주요 대학도서관의 한국관련 목록레코드 비교 분석 , 2014 .

[18]  Rajeev Motwani,et al.  The PageRank Citation Ranking : Bringing Order to the Web , 1999, WWW 1999.

[19]  Jong-Hak Lee,et al.  Analyses of multiple evidence combination , 1997, SIGIR '97.

[20]  E. Michael Keen,et al.  The Aberystwyth Index Languages Test. , 1973 .

[21]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[22]  Linda C. Smith Selected artificial intelligence techniques in information retrieval systems research. , 1979 .

[23]  W. Scott Spangler,et al.  Clustering hypertext with applications to web searching , 2000, HYPERTEXT '00.

[24]  Paul B. Kantor,et al.  A Study of Information Seeking and Retrieving. III. Searchers, Searches, and Overlap* , 1988 .

[25]  Paul B. Kantor,et al.  A study of information seeking and retrieving. I. background and methodology , 1988 .

[26]  S. K. Michael Wong,et al.  Linear structure in information retrieval , 1988, SIGIR '88.

[27]  Gerard Salton,et al.  Automatic indexing , 1980, ACM '80.

[28]  Chris Buckley,et al.  New Retrieval Approaches Using SMART: TREC 4 , 1995, TREC.

[29]  Christian Plaunt,et al.  An Association-Based Method for Automatic Indexing with a Controlled Vocabulary , 1998, J. Am. Soc. Inf. Sci..

[30]  Venkata Subramaniam,et al.  Information Retrieval: Data Structures & Algorithms , 1992 .

[31]  Pawan Kumar,et al.  Notice of Violation of IEEE Publication Principles The Anatomy of a Large-Scale Hyper Textual Web Search Engine , 2009 .

[32]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.