An empirical comparison of min–max-modular k-NN with different voting methods to large-scale text categorization

Text categorization refers to the task of assigning the pre-defined classes to text documents based on their content. k-NN algorithm is one of top performing classifiers on text data. However, there is little research work on the use of different voting methods over text data. Also, when a huge number of training data is available online, the response speed slows down, since a test document has to obtain the distance with each training data. On the other hand, min–max-modular k-NN (M3-k-NN) has been applied to large-scale text categorization. M3-k-NN achieves a good performance and has faster response speed in a parallel computing environment. In this paper, we investigate five different voting methods for k-NN and M3-k-NN. The experimental results and analysis show that the Gaussian voting method can achieve the best performance among all voting methods for both k-NN and M3-k-NN. In addition, M3-k-NN uses less k-value to achieve the better performance than k-NN, and thus is faster than k-NN in a parallel computing environment.

[1]  Fabrizio Sebastiani,et al.  Machine learning in automated text categorization , 2001, CSUR.

[2]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[3]  Bao-Liang Lu,et al.  Massively parallel classification of single-trial EEG signals using a min-max Modular neural network , 2004, IEEE Transactions on Biomedical Engineering.

[4]  Masami Ito,et al.  Task decomposition and module combination based on class relations: a modular neural network for pattern classification , 1999, IEEE Trans. Neural Networks.

[5]  Bao-Liang Lu,et al.  Multi-view Face Recognition with Min-Max Modular SVMs , 2005, ICNC.

[6]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[7]  Hai Zhao,et al.  Task Decomposition Using Geometric Relation for Min-Max Modular SVMs , 2005, ISNN.

[8]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[9]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[10]  Bao-Liang Lu,et al.  A part-versus-part method for massively parallel training of support vector machines , 2004, 2004 IEEE International Joint Conference on Neural Networks (IEEE Cat. No.04CH37541).

[11]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[12]  Hai Zhao,et al.  A Modular Reduction Method for k-NN Algorithm with Self-recombination Learning , 2006, ISNN.

[13]  J. L. Hodges,et al.  Discriminatory Analysis - Nonparametric Discrimination: Consistency Properties , 1989 .

[14]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[15]  Yiming Yang,et al.  An experimental study on large-scale web categorization , 2005, WWW '05.

[16]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[17]  Jun Luo,et al.  Gender Recognition Using a Min-Max Modular Support Vector Machine , 2005, ICNC.

[18]  Yiming Yang,et al.  RCV1: A New Benchmark Collection for Text Categorization Research , 2004, J. Mach. Learn. Res..

[19]  Hai Zhao,et al.  Fast text categorization with min-max modular support vector machines , 2005, Proceedings. 2005 IEEE International Joint Conference on Neural Networks, 2005..

[20]  Joan Cabestany,et al.  Biological and Artificial Computation: From Neuroscience to Technology , 1997, Lecture Notes in Computer Science.

[21]  Andrew McCallum,et al.  Using Maximum Entropy for Text Classification , 1999 .

[22]  Yang Yang,et al.  Prediction of Protein Subcellular Multi-locations with a Min-Max Modular Support Vector Machine , 2006, ISNN.

[23]  Hai Zhao,et al.  A Modular k-Nearest Neighbor Classification Method for Massively Parallel Text Categorization , 2004, CIS.

[24]  Bao-Liang Lu,et al.  Gender Recognition Using a Min-Max Modular Support Vector Machine , 2005, ICNC.

[25]  Alexander Bergo Text Categorization and Prototypes , 2001 .

[26]  Masami Ito,et al.  Task Decomposition Based on Class Relations: A Modular Neural Network Architecture for Pattern Classification , 1997, IWANN.