An Adaptive Fuzzy kNN Text Classifier Based on Gini Index Weight

In recent years, kNN algorithm is paid attention by many researchers and is proved one of the best text categorization algorithms. Text categorization is according to training set, which is assigned class label to decide a new document, which is not assigned class label belongs to some kind of document. But for a classifier, text preprocessing is the bottleneck of categorization. In the original feature space, there are always thousands upon thousands words. The dimension of feature space is very high. So in this paper, we adopt a new feature weight method---- improved Gini index to reduce the dimension of feature space and improve the categorization precision. In addition, we discuss the improvement of decision rule and dimension selection. We design an adaptive fuzzy kNN text classifier. Here the adaptive indicate the adaptive of dimension selection. The experiment results show that our algorithm is effective and feasible.

[1]  Yiming Yang,et al.  An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[2]  Heuiseok Lim An improved kNN learning based korean text classifier with heuristic information , 2002, Proceedings of the 9th International Conference on Neural Information Processing, 2002. ICONIP '02..

[3]  Yu Jian On the Fuzziness Index of the FCM Algorithms , 2003 .

[4]  David L. Waltz,et al.  Classifying news stories using memory based reasoning , 1992, SIGIR '92.

[5]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[6]  Songbo Tan,et al.  Neighbor-weighted K-nearest neighbor for unbalanced text corpus , 2005, Expert Syst. Appl..

[7]  S. K. Gupta,et al.  Scalable classifiers with dynamic pruning , 1998, Proceedings Ninth International Workshop on Database and Expert Systems Applications (Cat. No.98EX130).

[8]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[9]  Arlindo L. Oliveira,et al.  An Empirical Comparison of Text Categorization Methods , 2003, SPIRE.

[10]  George Karypis,et al.  A Feature Weight Adjustment Algorithm for Document Categorization , 2000 .

[11]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[12]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[13]  Hwee Tou Ng,et al.  Feature selection, perceptron learning, and a usability case study for text categorization , 1997, SIGIR '97.

[14]  Vipin Kumar,et al.  Text Categorization Using Weight Adjusted k-Nearest Neighbor Classification , 2001, PAKDD.

[15]  Andreas S. Weigend,et al.  A neural network approach to topic spotting , 1995 .

[16]  Peter E. Hart,et al.  Nearest neighbor pattern classification , 1967, IEEE Trans. Inf. Theory.

[17]  David D. Lewis,et al.  A comparison of two learning algorithms for text categorization , 1994 .

[18]  Yu Shiwen,et al.  An adaptive k -nearest neighbor text categorization strategy , 2004 .

[19]  Haibin Zhu,et al.  An Adaptive Fuzzy kNN Text Classifier , 2006, International Conference on Computational Science.

[20]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.