A Method of Text Classification Combining Naive Bayes and the Similarity Computing Algorithms

Text classification is one of the main issues in the big data analysis and research. In present, however, there is a lack of a universal algorithm model that can fulfill the requirement of both accuracy and efficiency of text classification. This paper proposes a method of text classification, which combines the Naive Bayes and the similarity computing algorithm. Firstly, the text information is cut into several word segmentation vectors by the Paoding Analyzer; then the Bayesian algorithm is employed to conduct the first-level directory classification to the text information; after that, the improved similarity computing algorithm is adopted to carry out the second-level directory classification. Finally, the algorithm model is tested with actual data, and the results are compared with those of Bayesian algorithm and similarity computing algorithm respectively. The results show that the proposed method achieves a higher precision rate.