A Hybrid Public Opinion Analysis Method Based on Improved Clustering and Mutual Information

The Internet is frequently used as a medium for exchange of information and opinions, and it is imperative to conduct public opinion analysis to get people’s opinions well understood and guided. In this paper a hybrid public opinion analysis method based on improved clustering and mutual information is proposed. During feature extraction, the weights of words are modified based on Part-of-Speech Tagging to reduce the dimensions of original texts. As for clustering, a novel density peak algorithm is improved and combined with binary search algorithm to determine the cluster number K and initial centers for KMeans. Then hot words extraction, sentiment analysis and trend analysis for each cluster are processed with mutual information to mine useful knowledge to help decision-making. Extensive experiments are conducted on Hadoop, and the results show that our hybrid Public Opinion Analysis method is quite effective and has certain significance.