论文信息 - Locally Adaptive Text Classification Based K-Nearest Neighbors

Locally Adaptive Text Classification Based K-Nearest Neighbors

Due to the exponential growth of documents on the Internet and the emergent need to organize them, the automated categorization of documents into predefined labels has received an ever-increased attention in the recent years. Among all these classifiers, k-nearest neighbors (KNNC) is a widely used classifier in text categorization community because of its simplicity and efficiency. However, KNNC still suffers from inductive biases or model misfits that result from its assumptions, such as the presumption that training data are evenly distributed among all categories. In this paper, we propose a new refinement strategy (LAKNNC) for the KNN classifier, which adopts sum-of-squared-error criterion to adaptively select the contributing part from these neighbors and classifies the input document in term of the disturbance degree which it brings to the kernel densities of these selected neighbors. The experimental results indicate that our algorithm LAKNNC is not sensitive to the parameter k and achieves significant classification performance improvement on imbalanced corpora.

Xiao-peng Yu | Xiao-gao Yu

[1] Songbo Tan,et al. An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[2] Yiming Yang,et al. A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3] Daniel A. Keim,et al. A General Approach to Clustering in Large Databases with Noise , 2003, Knowledge and Information Systems.

[4] Cor J. Veenman,et al. A Maximum Variance Cluster Algorithm , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[5] N. Japkowicz. Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[6] Heui-Seok Lim. Improving kNN Based Text Classification with Well Estimated Parameters , 2004, ICONIP.

[7] C. A. Murthy,et al. Multiscale Classification Using Nearest Neighbor Density Estimates , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[8] Daphna Weinshall,et al. Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[9] Yaxin Bi,et al. On combining classifier mass functions for text categorization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[10] Yiming Yang,et al. An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[11] Yiming Yang,et al. Expert network: effective and efficient learning from human decisions in text categorization and retrieval , 1994, SIGIR '94.