Automated Text Categorization Using Support Vector Machine

In this paper, we study the use of support vector machine in text categorization. Unlike other machine learning techniques , it allows easy incorporation of new documents into an existing trained system. Moreover, dimension reduction, which is usually imperative, now becomes optional. Thus, SVM adapts eeciently in dynamic environments that require frequent additions to the document collection. Empirical results on the Reuters-22173 collection are also discussed.

[1]  Daphne Koller,et al.  Hierarchically Classifying Documents Using Very Few Words , 1997, ICML.

[2]  Isabelle Guyon,et al.  Discovering Informative Patterns and Data Cleaning , 1996, Advances in Knowledge Discovery and Data Mining.

[3]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[4]  Federico Girosi,et al.  Training support vector machines: an application to face detection , 1997, Proceedings of IEEE Computer Society Conference on Computer Vision and Pattern Recognition.

[5]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[6]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[7]  Christopher J. C. Burges,et al.  Simplified Support Vector Decision Rules , 1996, ICML.

[8]  Philip J. Hayes,et al.  TCS: a shell for content-based text categorization , 1990, Sixth Conference on Artificial Intelligence for Applications.

[9]  Richard A. Harshman,et al.  Indexing by Latent Semantic Analysis , 1990, J. Am. Soc. Inf. Sci..

[10]  Sholom M. Weiss,et al.  Automated learning of decision rules for text categorization , 1994, TOIS.

[11]  David D. Lewis,et al.  An evaluation of phrasal and clustered representations on a text categorization task , 1992, SIGIR '92.

[12]  Yiming Yang,et al.  Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.

[13]  James P. Callan,et al.  Training algorithms for linear text classifiers , 1996, SIGIR '96.