A public opinion classification algorithm based on micro-blog text sentiment intensity: Design and implementation

on the features of short content and nearly real- time broadcasting velocity of micro-blog information, our lab constructed a public opinion corpus named MPO Corpus. Then, based on the analysis of the status of the network public opinion, it proposes an approach to calculate the sentiment intensity from three levels on words, sentences and documents respectively in this paper. Furthermore, on the basis of the MPO Corpus and HowNet Knowledge-base and sentiment analysis set, the feature words' semantic information is brought into the traditional vector space model to represent micro-blog documents. At the same time, the documents are classified by the subjects and sentiment intensity. Therefore, the experiment result indicates that the proposed method improves the efficiency and accuracy of the micro-blog content classification, the public opinion characteristics analysis and supervision in this paper. Thus, it provides a better technical support for content auditing and public opinion monitoring for micro-blog platform.

[1]  Dai Liu,et al.  A Comparative Study on Feature Selection in Chinese Text Categorization , 2004 .

[2]  Zhang Quan Research of Automatic Text Categorization Based on Sentence Category VSM , 2007 .

[3]  Qun Liu,et al.  基於《知網》的辭彙語義相似度計算 (Word Similarity Computing Based on How-net) [In Chinese] , 2002, ROCLING/IJCLCLP.

[4]  Pang Jian,et al.  Research and Implementation of Text Categorization System Based on VSM , 2001 .

[5]  Houkuan Huang,et al.  Feature selection for text classification with Naïve Bayes , 2009, Expert Syst. Appl..

[6]  Xie Fei,et al.  Method of Chinese text categorization based on the word vector space model , 2007 .

[7]  Thorsten Joachims,et al.  A Probabilistic Analysis of the Rocchio Algorithm with TFIDF for Text Categorization , 1997, ICML.

[8]  Yiming Yang,et al.  An Evaluation of Statistical Approaches to Text Categorization , 1999, Information Retrieval.

[9]  Yiming Yang,et al.  A re-examination of text categorization methods , 1999, SIGIR '99.

[10]  Kui-Lam Kwok Comparing representations in Chinese information retrieval , 1997, SIGIR '97.

[11]  Wang Rui Comparative Study of Feature Selection in Chinese Text Categorization , 2007 .

[12]  Xin Mingjun,et al.  A content tendency judgment algorithm for micro-blog platform , 2010, 2010 IEEE International Conference on Intelligent Computing and Intelligent Systems.

[13]  Hae-Chang Rim,et al.  Some Effective Techniques for Naive Bayes Text Classification , 2006, IEEE Transactions on Knowledge and Data Engineering.

[14]  Bai Wen-hua Term Weighting Algorithm in Text Categorization Based on VSM , 2010 .

[15]  Jian Zhang,et al.  On the use of words and n-grams for Chinese information retrieval , 2000, IRAL '00.

[16]  Ah-Hwee Tan,et al.  A Comparative Study on Chinese Text Categorization Methods , 2000, PRICAI Workshop on Text and Web Mining.

[17]  Liu Ying The Application of Naive Bayes in Text Classification Preprocessing , 2010, CIT 2010.

[18]  Xin Mingjun,et al.  An Approach to Micro-blog Sentiment Intensity Computing Based on Public Opinion Corpus , 2012 .