Chinese short-text classification in two-steps

Three key issues of classifying Chinese short-text in two-steps were discussed to mine text information effectively,and a method of combining naive Bayesian(NB) with k-nearest neighbor(KNN) classifiers for this task was developed.Firstly,the test text collection was divided into three parts: part-A which could be classified reliably by KNN,part-B which could not be classified reliably by KNN but could be classified reliably by NB and the another part-C.All above was implemented by utilizing the outputs of NB or KNN classifier to construct the corresponding two-dimension space respectively,and thereby making the division according to the distribution of texts misclassified in the space.Then,part-A and part-B was classified respectively by using KNN and NB classifiers,and partC was assigned directly the labels according to the distribution of categorization in the training data.The experimental results show that the proposed method achieves high performance comparing with KNN,NB and support vector machine(SVM).