Two odds-radio-based text classification algorithms

Since 1990's, the exponential growth of theseWeb documents has led to a great deal of interestin developing efficient tools and software toassist users in finding relevant information. Textclassification has been proved to be useful inhelping organize and search text information onthe Web. Although there have been existed anumber of text classification algorithms, most ofthem are either inefficient or too complex. In thispaper we present two Odds-Radio-Based textclassification algorithms, which are called ORand TF*OR respectively. We have evaluated ouralgorithm on two text collections and compared itagainst k-NN and SVM. Experimental resultsshow that OR and TF*OR are competitive withk-NN and SVM. Furthermore, OR and TF*OR ismuch simpler and faster than them. The resultsalso indicate that it is not TF but relevancefactors derived from Odds Radio that play thedecisive role in document categorization.

[1]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[2]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[3]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[4]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[5]  David D. Lewis,et al.  Representation and Learning in Information Retrieval , 1991 .

[6]  Vladimir Vapnik,et al.  The Nature of Statistical Learning , 1995 .

[7]  hierarchyDunja Mladeni Feature Selection for Classiication Based on Text Hierarchy , 1998 .

[8]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[9]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[10]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[11]  Gerard Salton,et al.  Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer , 1989 .

[12]  Takenobu Tokunaga,et al.  Cluster-based text categorization: a comparison of category search strategies , 1995, SIGIR '95.

[13]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[14]  Vladimir N. Vapnik,et al.  The Nature of Statistical Learning Theory , 2000, Statistics for Engineering and Information Science.

[15]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[16]  Anil K. Jain,et al.  Classification of text documents , 1998, Proceedings. Fourteenth International Conference on Pattern Recognition (Cat. No.98EX170).

[17]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[18]  Thorsten Joachims,et al.  Making large scale SVM learning practical , 1998 .