Hierarchical Text Classification Model Based on Blocking Priori Knowledge

Blocking exerts negative effect on the performance of text hierarchical classification.In this paper,a two-step hierarchical text classification model based on blocking priori knowledge is proposed to address the problem.Firstly,blocking distribution is estimated and blocking pair recognition technique focusing on mining the serious blocking direction is presented.Secondly,the hierarchy topology structure is actively refined which attempts to correct misclassification and reduce blocking errors by using blocking priori knowledge.The experimental results on TanCorp,which is a new corpus special for Chinese text classification,show that the model can improve the performance significantly without increasing the extra number of classifiers and is a method of solving the hierarchical classification blocking problem.In addition,compared with flat text classification algorithm,this method has stable performance.