A Comparison of Evaluation Metrics for Document Filtering

Although document filtering is simple to define, there is a wide range of different evaluation measures that have been proposed in the literature, all of which have been subject to criticism. We present a unified, comparative view of the strenghts and weaknesses of proposed measures based on two formal constraints (which should be satisfied by any suitable evaluation measure) and various properties (which help differentiating measures according to their behaviour). We conclude that (i) some smoothing process is necessary process to satisfy the basic constraints; and (ii) metrics can be grouped into three families, each satisfying one out of three formal properties, which are mutually exclusive, i.e. no metric can satisfy all three properties simultaneously.

[1]  David A. Hull The TREC-7 Filtering Track: Description and Analysis , 1998, Text Retrieval Conference.

[2]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[3]  Abdul Sattar,et al.  AI 2006: Advances in Artificial Intelligence, 19th Australian Joint Conference on Artificial Intelligence, Hobart, Australia, December 4-8, 2006, Proceedings , 2006, Australian Conference on Artificial Intelligence.

[4]  Jacob Cohen A Coefficient of Agreement for Nominal Scales , 1960 .

[5]  David A. Hull The TREC-6 Filtering Track: Description and Analysis , 1997, TREC.

[6]  Keiichiro Hoashi,et al.  Document filtering method using non-relevant information profile , 2000, SIGIR '00.

[7]  Michael Persin,et al.  Document filtering for fast ranking , 1994, SIGIR '94.

[8]  Georgios Paliouras,et al.  An evaluation of Naive Bayesian anti-spam filtering , 2000, ArXiv.

[9]  Tom Fawcett,et al.  Analysis and Visualization of Classifier Performance: Comparison under Imprecise Class and Cost Distributions , 1997, KDD.

[10]  C. J. van Rijsbergen,et al.  FOUNDATION OF EVALUATION , 1974 .

[11]  Tom Fawcett,et al.  PAV and the ROC convex hull , 2007, Machine Learning.

[12]  Gordon V. Cormack,et al.  TREC 2006 Spam Track Overview , 2006, TREC.

[13]  Edmund E. Day The Rôle of Statistics in Business Forecasting , 1928 .

[14]  Muyun Yang,et al.  Re-examination on lam% in spam filtering , 2010, SIGIR.

[15]  Stan Szpakowicz,et al.  Beyond Accuracy, F-Score and ROC: A Family of Discriminant Measures for Performance Evaluation , 2006, Australian Conference on Artificial Intelligence.

[16]  Douglas W. Oard,et al.  Overview of the TREC 2009 Legal Track , 2009, TREC.

[17]  Julio Gonzalo,et al.  WePS3 Evaluation Campaign: Overview of the On-line Reputation Management Task , 2010, CLEF.

[18]  Frederick Mosteller,et al.  Association and Estimation in Contingency Tables , 1968 .

[19]  Stephen E. Robertson,et al.  The TREC-8 Filtering Track Final Report , 1999, TREC.

[20]  Chih-Ping Wei,et al.  Effective spam filtering: A single-class learning and ensemble approach , 2008, Decis. Support Syst..

[21]  Mads Haahr,et al.  A Case-Based Approach to Spam Filtering that Can Track Concept Drift , 2003 .

[22]  James P. Callan,et al.  Document filtering with inference networks , 1996, SIGIR '96.