论文信息 - Novel Text Classification Based on K-Nearest Neighbor

Novel Text Classification Based on K-Nearest Neighbor

K-nearest neighbors classifier (KNNC) is widely used because of its simplicity and efficiency. It includes k-nearest neighbors search (KNNS) and classification. Existing centralized KNNS does not scale up to large volume of data, and the classification still suffers from inductive biases that result from its assumptions, such as the presumption that training data are evenly distributed This paper proposes a method (P2PKNNC) which improves performance of kNN based text classification in the P2P communication paradigm. P2PKNNC adaptively executes k nearest neighbor(s) queries in a distributed metric structure, which is based on the generalized hyperplane partitioning. And it selects the influencing part from these neighbors and classifies the input document in term of the disturbance degree which it brings to the kernel densities of these influencing neighbors for uneven text sets. The experimental results indicate that our algorithm achieves significant classification performance improvement on imbalanced corpora.

Xiao-Peng Yu | Xiao-Gao Yu

[1] Pavel Zezula,et al. A Scalable Nearest Neighbor Search in P2P Systems , 2004, DBISP2P.

[2] C. A. Murthy,et al. Multiscale Classification Using Nearest Neighbor Density Estimates , 2006, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[3] Heui-Seok Lim. Improving kNN Based Text Classification with Well Estimated Parameters , 2004, ICONIP.

[4] Pavel Zezula,et al. Scalable Similarity Search in Metric Spaces , 2004, DELOS.

[5] N. Japkowicz. Learning from Imbalanced Data Sets: A Comparison of Various Strategies * , 2000 .

[6] Songbo Tan,et al. An effective refinement strategy for KNN text classifier , 2006, Expert Syst. Appl..

[7] Michal Batko,et al. Distributed and Scalable Similarity Searching in Metric Spaces , 2004, EDBT Workshops.

[8] Yaxin Bi,et al. On combining classifier mass functions for text categorization , 2005, IEEE Transactions on Knowledge and Data Engineering.

[9] Daphna Weinshall,et al. Classification with Nonmetric Distances: Image Retrieval and Class Representation , 2000, IEEE Trans. Pattern Anal. Mach. Intell..

[10] Pavel Zezula,et al. Nearest neighbor search in metric spaces through Content-Addressable Networks , 2007, Inf. Process. Manag..

[11] Pavel Zezula,et al. Nearest neighbor search in metric spaces through Content-Addressable Networks , 2007, Inf. Process. Manag..