论文信息 - Combinatoric Models of Information Retrieval Ranking Methods and Performance Measures for Weakly-Ordered Document Collections

Combinatoric Models of Information Retrieval Ranking Methods and Performance Measures for Weakly-Ordered Document Collections

LEWIS CHURCH: Combinatoric Models of Information Retrieval Ranking Methods and Performance Measures for Weakly-Ordered Document Collections (Under the direction of Robert M. Losee) This dissertation answers three research questions: (1) What are the characteristics of a combinatoric measure, based on the Average Search Length (ASL), that performs the same as a probabilistic version of the ASL?; (2) Does the combinatoric ASL measure produce the same performance result as the one that is obtained by ranking a collection of documents and calculating the ASL by empirical means?; and (3) When does the ASL and either the Expected Search Length, MZ-based E, or Mean Reciprocal Rank measure both imply that one document ranking is better than another document ranking? Concepts and techniques from enumerative combinatorics and other branches of mathematics were used in this research to develop combinatoric models and equations for several information retrieval ranking methods and performance measures. Empirical, statistical, and simulation means were used to validate these models and equations. The document cut-off performance measure equation variants that were developed in this dissertation can be used for performance prediction and to help study any vector V of ranked documents, at arbitrary document cut-off points, provided that (1) relevance is binary and (2) the following information can be determined from the ranked output: the document equivalence classes and their relative sequence, the number of documents in each equivalence class, and the number of relevant documents that each class contains. The performance measure equations yielded correct values for both stronglyand weaklyordered document collections.

Lewis Church | Lewis Church

[1] Sndor Dominich. Mathematical Foundations of Information Retrieval , 2002, Computational Linguistics.

[2] H. Raiffa,et al. Introduction to Statistical Decision Theory , 1996 .

[3] Robert M. Losee. Probabilistic retrieval and coordination level matching , 1987 .

[4] T. Koornwinder,et al. BASIC HYPERGEOMETRIC SERIES (Encyclopedia of Mathematics and its Applications) , 1991 .

[5] Ivar Jacobson,et al. The Unified Software Development Process , 1999 .

[6] Phil Spector,et al. Data manipulation with R , 2008 .

[7] Pertti Vakkari,et al. Changes in relevance criteria and problem stages in task performance , 2000, J. Documentation.

[8] Padmini Srinivason. On generalizing the Two-Poisson model , 1989 .

[9] Amanda Spink,et al. From Highly Relevant to Not Relevant: Examining Different Regions of Relevance , 1998, Inf. Process. Manag..

[10] ChengXiang Zhai,et al. Probabilistic Relevance Models Based on Document and Query Generation , 2003 .

[11] Ellen M. Voorhees,et al. The TREC-8 Question Answering Track Report , 1999, TREC.