论文信息 - An Optimized K-Nearest Neighbor Algorithm for Large Scale Hierarchical Text Classification

An Optimized K-Nearest Neighbor Algorithm for Large Scale Hierarchical Text Classification

In this paper, an optimized k nearest neighbor algorithm for the 2nd edition of the Large Scale Hierarchical Text Classification Pascal Challenge was summarized. Firstly, we perform k-NN algorithm on the datasets to obtain the top-k nearest neighbors for each testing documents. Secondly, several critical category-neighbors features were identified and the impact of each of those features were estimated through cross-validation. Finally, the categories prediction algorithm utilizes the optimal parameters for the category-neighbors features to predict the categories for the testing documents. The experiments performed on the three datasets for the challenge show that the classifier can get high accuracy.

Chunyan Miao | Zhiqi Shen | Xiaogang Han | Junfa Liu

[1] Yiming Yang,et al. A re-examination of text categorization methods , 1999, SIGIR '99.

[2] Grigorios Tsoumakas,et al. Random K-labelsets for Multilabel Classification , 2022 .

[3] Yiming Yang,et al. An example-based mapping method for text categorization and retrieval , 1994, TOIS.

[4] W. Bruce Croft,et al. Combining classifiers in text categorization , 1996, SIGIR '96.