Fast Text Classification Based on Dynamical Generation of Representative Samples

As a simple, effective and nonparametric classification method, κ - Nearest Neighbor method is widely used in text classification, but it has large computational demands. In this paper a new fast text classification approach is proposed to solve the problem. The method generates representative samples through training the original samples, and then adjusts the representative samples repeatedly for enhancing its representative ability according to the distribution of the original training samples and generated representative samples. By using this approach, the original training corpus can be compressed effectively so that the classification efficiency can be improved substantially. Meanwhile, this approach makes the distribution of representative samples more even, so the classification performance can be improved. Experiments also show that this approach has a good performance.