Modelling Citation Networks for Improving Scientific Paper Classification Performance

This paper describes an approach to the use of citation links to improve the scientific paper classification performance. In this approach, we develop two refinement functions, a linear label refinement (LLR) and a probabilistic label refinement (PLR), to model the citation link structures of the scientific papers for refining the class labels of the documents obtained by the content-based Naive Bayes classification method. The approach with the two new refinement models is examined and compared with the content-based Naive Bayes method on a standard paper classification data set with increasing training set sizes. The results suggest that both refinement models can significantly improve the system performance over the content-based method for all the training set sizes and that PLR is better than LLR when the training examples are sufficient.