An Optimized K-Nearest Neighbor Algorithm for Large Scale Hierarchical Text Classification

In this paper, an optimized k nearest neighbor algorithm for the 2nd edition of the Large Scale Hierarchical Text Classification Pascal Challenge was summarized. Firstly, we perform k-NN algorithm on the datasets to obtain the top-k nearest neighbors for each testing documents. Secondly, several critical category-neighbors features were identified and the impact of each of those features were estimated through cross-validation. Finally, the categories prediction algorithm utilizes the optimal parameters for the category-neighbors features to predict the categories for the testing documents. The experiments performed on the three datasets for the challenge show that the classifier can get high accuracy.