Relevance Feedback using Support Vector Machines

We show that support vectors machines (SVM’s) are much better than conventional algorithms in a relevancy feedback (RF) environment in information retrieval (IR) of text documents. We track performance as a function of feedback iteration and show that while the conventional algorithms do very well in the initial feedback iteration if the topic searched for has high visibility in the data base, they do very poorly if the relevant documents are a small percentage of the data base. SVM’s however do very well when the number of documents returned in the preliminary search is low and the number of relevant documents is small. The competitive algorithms examined are Rocchio, Ide regular, and Ide dec-hi.

[1]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[2]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[3]  Susan Brewer,et al.  Information storage and retrieval , 1959, ACM '59.

[4]  Yoram Singer,et al.  Boosting and Rocchio applied to text filtering , 1998, SIGIR '98.

[5]  Tefko Saracevic,et al.  RELEVANCE: A review of and a framework for the thinking on the notion in information science , 1997, J. Am. Soc. Inf. Sci..

[6]  Thomas S. Huang,et al.  Relevance feedback: a power tool for interactive content-based image retrieval , 1998, IEEE Trans. Circuits Syst. Video Technol..

[7]  James Allan,et al.  The effect of adding relevance information in a relevance feedback environment , 1994, SIGIR '94.

[8]  Micha Hofri,et al.  Probabilistic Analysis of Algorithms , 1987, Texts and Monographs in Computer Science.

[9]  James Allan,et al.  Automatic Retrieval With Locality Information Using SMART , 1992, TREC.

[10]  Donna K. Harman,et al.  Relevance feedback revisited , 1992, SIGIR '92.

[11]  Gerard Salton,et al.  Improving retrieval performance by relevance feedback , 1997, J. Am. Soc. Inf. Sci..

[12]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[13]  S. Robertson The probability ranking principle in IR , 1997 .

[14]  David D. Lewis,et al.  Evaluating and optimizing autonomous text classification systems , 1995, SIGIR '95.

[15]  IJsbrand Jan Aalbersberg,et al.  Incremental relevance feedback , 1992, SIGIR '92.

[16]  J. J. Rocchio,et al.  Relevance feedback in information retrieval , 1971 .

[17]  Thorsten Joachims,et al.  Text categorization with support vector machines , 1999 .

[18]  Sergey Brin,et al.  The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[19]  Susan T. Dumais,et al.  Inductive learning algorithms and representations for text categorization , 1998, CIKM '98.

[20]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.

[21]  Greg Schohn,et al.  Less is More: Active Learning with Support Vector Machines , 2000, ICML.

[22]  Daphne Koller,et al.  Support Vector Machine Active Learning with Application sto Text Classification , 2000, ICML.

[23]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[24]  Daphne Koller,et al.  Support Vector Machine Active Learning with Applications to Text Classification , 2000, J. Mach. Learn. Res..

[25]  Harris Drucker,et al.  Support vector machines for spam categorization , 1999, IEEE Trans. Neural Networks.

[26]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[27]  Jean Tague-Sutcliffe Measuring the informativeness of a retrieval process , 1992, SIGIR '92.

[28]  Vladimir Vapnik,et al.  Statistical learning theory , 1998 .

[29]  Stefano Mizzaro,et al.  Relevance: The Whole History , 1997, J. Am. Soc. Inf. Sci..

[30]  Gerard Salton,et al.  The SMART Retrieval System—Experiments in Automatic Document Processing , 1971 .