论文信息 - Measuring constraint violations in information retrieval

Measuring constraint violations in information retrieval

Recently, an inductive approach to modelling term-weighting function correctness has provided a number of axioms (constraints), to which all good term-weighting functions should adhere. These constraints have been shown to be theoretically and empirically sound in a number of works. It has been shown that when a term-weighting function breaks one or more of the constraints, it typically indicates sub-optimality of that function. This elegant inductive approach may more accurately model the human process of determining the relevance a document. It is intuitive that a person's notion of relevance changes as terms that are either on or off-topic are encountered in a given document. Ultimately, it would be desirable to be able to mathematically determine the performance of term-weighting functions without the need for test collections. Many modern term-weighting functions do not satisfy the constraints in an unconditional manner. However, the degree to which these functions violate the constraints has not been investigated. A comparison between weighting functions from this perspective may shed light on the poor performance of certain functions in certain settings. Moreover, if a correlation exists between performance and the number of violations, measuring the degree of violation could help more accurately predict how a certain scheme will perform on a given collection.

Ronan Cummins | Colm O'Riordan

[1] ChengXiang Zhai,et al. An exploration of axiomatic approaches to information retrieval , 2005, SIGIR '05.

[2] Tao Tao,et al. A formal study of information retrieval heuristics , 2004, SIGIR '04.

[3] Ronan Cummins,et al. An axiomatic comparison of learned term-weighting schemes in information retrieval: clarifications and extensions , 2007, Artificial Intelligence Review.