论文信息 - Selective web information retrieval

Selective web information retrieval

This thesis proposes selective Web information retrieval, a framework formulated in terms of statistical decision theory, with the aim to apply an appropriate retrieval approach on a per-query basis. The main component of the framework is a decision mechanism that selects an appropriate retrieval approach on a per-query basis. The selection of a particular retrieval approach is based on the outcome of an experiment, which is performed before the final ranking of the retrieved documents. The experiment is a process that extracts features from a sample of the set of retrieved documents. This thesis investigates three broad types of experiments. The first one counts the occurrences of query terms in the retrieved documents, indicating the extent to which the query topic is covered in the document collection. The second type of experiments considers information from the distribution of retrieved documents in larger aggregates of related Web documents, such as whole Web sites, or directories within Web sites. The third type of experiments estimates the usefulness of the hyperlink structure among a sample of the set of retrieved Web documents. The proposed experiments are evaluated in the context of both informational and navigational search tasks with an optimal Bayesian decision mechanism, where it is assumed that relevance information exists. This thesis further investigates the implications of applying selective Web information retrieval in an operational setting, where the tuning of a decision mechanism is based on limited existing relevance information and the information retrieval system’s input is a stream of queries related to mixed informational and navigational search tasks. First, the experiments are evaluated using different training and testing query sets, as well as a mixture of different types of queries. Second, query sampling is introduced, in order to approximate the queries that a retrieval system receives, and to tune an ad-hoc decision mechanism with a broad set of automatically sampled queries.

Vasileios Plachouras | Vasileios Plachouras

[1] Donna K. Harman,et al. Overview of the First Text REtrieval Conference (TREC-1) , 1992, TREC.

[2] Hugh E. Williams,et al. Compressing Integers for Fast File Access , 1999, Comput. J..

[3] Jacques Savoy,et al. Retrieval effectiveness on the web , 2001, Inf. Process. Manag..

[4] R. Manmatha,et al. Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[5] Thorsten Joachims,et al. Accurately Interpreting Clickthrough Data as Implicit Feedback , 2017 .

[6] Thorsten Joachims,et al. Optimizing search engines using clickthrough data , 2002, KDD.

[7] M. de Rijke,et al. Approaches to Robust and Web Retrieval , 2003, TREC.

[8] Tomohiro Takagi,et al. Meiji University Web, Novelty and Genomic Track Experiments , 2004, TREC.

[9] King-Lup Liu,et al. Building efficient and effective metasearch engines , 2002, CSUR.

[10] Donald H. Kraft,et al. Proceedings of the 24th annual international ACM SIGIR conference on Research and development in information retrieval , 1998, SIGIR 2002.

[11] Djoerd Hiemstra,et al. The Importance of Prior Probabilities for Entry Page Search , 2002, SIGIR '02.