Research and Realization of Internet Public Opinion Analysis Based on Improved TF - IDF Algorithm

At present, the main methods of network public opinion analysis include data acquisition, information extraction, spam filtering, similarity clustering, emotion analysis, positive and negative judgment. The extraction of data information based on text characteristic extraction is a key step. In this paper, the traditional TF-IDF method is improved by introducing the part of speech weight coefficient and the position weight (span weight) of the characteristic word. The experimental results show that the improved method can effectively improve the clustering effect of the characteristic words, and is better able to reflect the textual characteristics. Applying it to the public opinion analysis system has achieved good results.