Improved KNN text classification algorithm with MapReduce implementation

The classic K-Nearest Neighbor (KNN) classification algorithm is widely used in text classification. This paper proposes an efficient algorithm for text classification by improving the traditional TF-IDF based KNN text classification algorithm. In addition, the MapReduce parallel implementation of the new algorithm on the Hadoop platform is introduced to improve the capacity of the KNN algorithm to process large data sets. The experimental results show that the new algorithm implemented through MapReduce not only improves the classification accuracy, but also has the advantages of fast convergence and good scalability.