An Unbiased Generative Model for Setting Dissemination Thresholds

Information filtering systems based on statistical retrieval models usually compute a numeric score that indicates how well each document matches each profile. Documents with scores above profile-specific dissemination thresholds are delivered. Optimal dissemination thresholds are usually difficult to determine a priori, so they are often learned during filtering, using relevance feedback about disseminated documents. However, the scores of disseminated documents are a biased sample of the complete distribution of document scores, which causes some algorithms to learn suboptimal thresholds.

[1]  Stephen E. Robertson,et al.  Threshold setting in adaptive filtering , 2000, J. Documentation.

[2]  R. Manmatha,et al.  Modeling score distributions for combining the outputs of search engines , 2001, SIGIR '01.

[3]  Yi Zhang,et al.  The Bias Problem and Language Models in Adaptive Filtering , 2001, TREC.

[4]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[5]  William H. Press,et al.  The Art of Scientific Computing Second Edition , 1998 .

[6]  W. Bruce Croft,et al.  Document Retrieval and Routing Using the INQUERY System , 1994, TREC.

[7]  Avi Arampatzis,et al.  Unbiased S-D Threshold Optimization, Initial Query Degradation, Decay, and Incrementality, for Adaptive Document Filtering , 2001, TREC.

[8]  Yi Zhang,et al.  YFilter at TREC-9 , 2000, TREC.

[9]  Yi Zhang,et al.  Maximum likelihood estimation for filtering thresholds , 2001, SIGIR '01.

[10]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[11]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[12]  M. F. Porter,et al.  An algorithm for suffix stripping , 1997 .

[13]  Michael I. Jordan,et al.  On Discriminative vs. Generative Classifiers: A comparison of logistic regression and naive Bayes , 2001, NIPS.

[14]  Peter Jansen,et al.  Threshold Calibration in CLARIT Adaptive Filtering , 1998, TREC.

[15]  James Allan,et al.  Incremental relevance feedback for information filtering , 1996, SIGIR '96.

[16]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[17]  Stephen E. Robertson,et al.  Microsoft Cambridge at TREC 2002: Filtering Track , 2002, TREC.

[18]  Trevor J. Hastie,et al.  Discriminative vs Informative Learning , 1997, KDD.

[19]  Steve Renals,et al.  Proceedings of the Ninth Text REtrieval Conference , 2001 .

[20]  Byoung-Tak Zhang,et al.  Text filtering by boosting naive Bayes classifiers , 2000, SIGIR '00.

[21]  Stephen E. Robertson,et al.  Okapi at TREC-3 , 1994, TREC.

[22]  James P. Callan,et al.  Document filtering with inference networks , 1996, SIGIR '96.

[23]  Avi Arampatzis,et al.  The score-distributional threshold optimization for adaptive binary classification tasks , 2001, SIGIR '01.