Combining Web Document Representations in a Bayesian Inference Network Model Using Link and Content-Based Evidence

This paper introduces an expressive formal Information Retrieval model developed for the Web. It is based on the Bayesian inference network model and views IR as an evidential reasoning process. It supports the explicit combination of multiple Web document representations under a single framework. Information extracted from the content of Web documents and derived from the analysis of the Web link structure is used as source of evidence in support of the ranking algorithm. This content and link-based evidential information is utilised in the generation of the multiple Web document representations used in the combination.

[1]  Susan Gauch,et al.  Incorporating quality metrics in centralized/distributed information retrieval on the World Wide Web , 2000, SIGIR '00.

[2]  Jon M. Kleinberg,et al.  Automatic Resource Compilation by Analyzing Hyperlink Structure and Associated Text , 1998, Comput. Networks.

[3]  Stephen Robertson,et al.  THEORIES AND MODELS IN INFORMATION RETRIEVAL , 1977 .

[4]  Jacques Savoy,et al.  Report on the TREC-5 Experiment: Data Fusion and Collection Fusion , 1996, TREC.

[5]  Mounia Lalmas,et al.  Merging techniques for performing data fusion on the web , 2001, CIKM '01.

[6]  大西 仁,et al.  Pearl, J. (1988, second printing 1991). Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference. Morgan-Kaufmann. , 1994 .

[7]  M. I. Mauldin,et al.  Lycos: design choices in an Internet search service , 1997 .

[8]  W. Bruce Croft,et al.  Combining automatic and manual index representations in probabilistic retrieval , 1995 .

[9]  W. Bruce Croft Combining Approaches to Information Retrieval , 2002 .

[10]  C. J. van Rijsbergen,et al.  Combining and selecting characteristics of information use , 2002, J. Assoc. Inf. Sci. Technol..

[11]  Einat Amitay,et al.  Using common hypertext links to identify the best phrasal description of target web documents , 1998 .

[12]  Mark D. Dunlop,et al.  Hypermedia and Free Text Retrieval , 1993, Inf. Process. Manag..

[13]  Jean-Pierre Chevallet,et al.  Toward a Structured Information Retrieval System on the Web: Automatic Structure Extraction of Web Pages , 2001, WebDyn@ICDT.

[14]  Krishna Bharat,et al.  Improved algorithms for topic distillation in a hyperlinked environment , 1998, SIGIR '98.

[15]  Giuseppe Attardi,et al.  Automatic Web Page Categorization by Link and Context Analysis , 1999 .

[16]  Peter Willett,et al.  Readings in information retrieval , 1997 .

[17]  C. J. van Rijsbergen,et al.  A Non-Classical Logic for Information Retrieval , 1997, Comput. J..

[18]  W. Bruce Croft,et al.  A retrieval model incorporating hypertext links , 1989, Hypertext.

[19]  E. A. Fox,et al.  Combining the Evidence of Multiple Query Representations for Information Retrieval , 1995, Inf. Process. Manag..

[20]  Stephen E. Robertson,et al.  Relevance weighting of search terms , 1976, J. Am. Soc. Inf. Sci..

[21]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[22]  W. Bruce Croft,et al.  Evaluation of an inference network-based retrieval model , 1991, TOIS.

[23]  Richard R. Muntz,et al.  Bayesian Network Models for Information Retrieval , 2000 .

[24]  Judea Pearl,et al.  Probabilistic reasoning in intelligent systems - networks of plausible inference , 1991, Morgan Kaufmann series in representation and reasoning.

[25]  Steve Lawrence,et al.  Context in Web Search , 2000, IEEE Data Eng. Bull..

[26]  Guijun Wang,et al.  ProFusion*: Intelligent Fusion from Multiple, Distributed Search Engines , 1996, J. Univers. Comput. Sci..

[27]  Oren Etzioni,et al.  The MetaCrawler architecture for resource aggregation on the Web , 1997 .

[28]  Gerard Salton,et al.  A vector space model for automatic indexing , 1975, CACM.

[29]  W. Bruce Croft,et al.  The INQUERY Retrieval System , 1992, DEXA.

[30]  E. Amitay,et al.  InCommonSense-rethinking Web search results , 2000, 2000 IEEE International Conference on Multimedia and Expo. ICME2000. Proceedings. Latest Advances in the Fast Changing World of Multimedia (Cat. No.00TH8532).

[31]  Fabio Crestani,et al.  Lectures on information retrieval : third European Summer -School, ESSIR 2000, Varenna, Italy, September 11-15, 2000 : revised lectures , 2001 .

[32]  Massimo Melucci,et al.  Information Retrieval on the Web , 2001, ESSIR.

[33]  Berthier A. Ribeiro-Neto,et al.  Link-based and content-based evidential information in a belief network model , 2000, SIGIR '00.

[34]  Jeffrey Katzer,et al.  A study of the overlap among document representations , 1983, SIGIR '83.

[35]  Edward A. Fox,et al.  Coefficients of combining concept classes in a collection , 1988, SIGIR '88.

[36]  D. R. Elchesen,et al.  General: Effectiveness of Combining Title Words and Index Terms in Machine Retrieval Searches , 1972, Nature.

[37]  Weiyi Meng,et al.  A new study on using HTML structures to improve retrieval , 1999, Proceedings 11th International Conference on Tools with Artificial Intelligence.

[38]  Brian D. Davison Topical locality in the Web , 2000, SIGIR '00.