Improved KNN classification algorithms research in text categorization

Text classification is the important part of information retrieved and text mining, in text classification process, the traditional KNN classification algorithm's calculation volume is huge and KNN classification of precision will fall when between the category have more common, in this basics, the improved KNN method is proposed, first the most likely k0 candidate category are got through Rocchio classification method, and then the part of representative sample are extracted in the k0 category training document. This method solves the above two problems to a certain extent, and has good results in the classification, improving classification performance.