Measuring constraint violations in information retrieval

Recently, an inductive approach to modelling term-weighting function correctness has provided a number of axioms (constraints), to which all good term-weighting functions should adhere. These constraints have been shown to be theoretically and empirically sound in a number of works. It has been shown that when a term-weighting function breaks one or more of the constraints, it typically indicates sub-optimality of that function. This elegant inductive approach may more accurately model the human process of determining the relevance a document. It is intuitive that a person's notion of relevance changes as terms that are either on or off-topic are encountered in a given document. Ultimately, it would be desirable to be able to mathematically determine the performance of term-weighting functions without the need for test collections. Many modern term-weighting functions do not satisfy the constraints in an unconditional manner. However, the degree to which these functions violate the constraints has not been investigated. A comparison between weighting functions from this perspective may shed light on the poor performance of certain functions in certain settings. Moreover, if a correlation exists between performance and the number of violations, measuring the degree of violation could help more accurately predict how a certain scheme will perform on a given collection.