Novel Text Classification Based on K-Nearest Neighbor

K-nearest neighbors classifier (KNNC) is widely used because of its simplicity and efficiency. It includes k-nearest neighbors search (KNNS) and classification. Existing centralized KNNS does not scale up to large volume of data, and the classification still suffers from inductive biases that result from its assumptions, such as the presumption that training data are evenly distributed This paper proposes a method (P2PKNNC) which improves performance of kNN based text classification in the P2P communication paradigm. P2PKNNC adaptively executes k nearest neighbor(s) queries in a distributed metric structure, which is based on the generalized hyperplane partitioning. And it selects the influencing part from these neighbors and classifies the input document in term of the disturbance degree which it brings to the kernel densities of these influencing neighbors for uneven text sets. The experimental results indicate that our algorithm achieves significant classification performance improvement on imbalanced corpora.