Using a contextual entropy model to expand emotion words and their intensity for the sentiment classification of stock market news

Sentiment classification of stock market news involves identifying positive and negative news articles, and is an emerging technique for making stock trend predictions which can facilitate investor decision making. In this paper, we propose the presence and intensity of emotion words as features to classify the sentiment of stock market news articles. To identify such words and their intensity, a contextual entropy model is developed to expand a set of seed words generated from a small corpus of stock market news articles with sentiment annotation. The contextual entropy model measures the similarity between two words by comparing their contextual distributions using an entropy measure, allowing for the discovery of words similar to the seed words. Experimental results show that the proposed method can discover more useful emotion words and their corresponding intensity, thus improving classification performance. Performance was further improved by the incorporation of intensity into the classification, and the proposed method outperformed the previously-proposed pointwise mutual information (PMI)-based expansion methods.

[1]  Curt Burgess,et al.  Explorations in context space: Words, sentences, discourse , 1998 .

[2]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[3]  Andrés Montoyo,et al.  Building and Exploiting EmotiNet, a Knowledge Base for Emotion Detection Based on the Appraisal Theory Model , 2012, IEEE Transactions on Affective Computing.

[4]  Dunja Mladenic,et al.  OntoPlus: Text-driven ontology extension using ontology content, structure and co-occurrence information , 2011, Knowl. Based Syst..

[5]  Marco Baroni,et al.  Identifying subjective adjectives through web-based mutual information , 2004 .

[6]  Chao Wu,et al.  Forecasting stock indices using radial basis function neural networks optimized by artificial fish swarm algorithm , 2011, Knowl. Based Syst..

[7]  Tao Xu,et al.  Identifying the semantic orientation of terms using S-HAL for sentiment analysis , 2012, Knowl. Based Syst..

[8]  Gerard Salton,et al.  Term-Weighting Approaches in Automatic Text Retrieval , 1988, Inf. Process. Manag..

[9]  Desheng Dash Wu,et al.  Using text mining and sentiment analysis for online forums hotspot detection and forecast , 2010, Decis. Support Syst..

[10]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[11]  Naveen Kumar,et al.  Sentence Emotion Analysis and Recognition Based on Emotion Words Using Ren-CECps ∗ , 2010 .

[12]  Chung-Hsien Wu,et al.  Annotation and verification of sense pools in OntoNotes , 2010, Inf. Process. Manag..

[13]  Harun Uguz,et al.  A two-stage feature selection method for text categorization by using information gain, principal component analysis and genetic algorithm , 2011, Knowl. Based Syst..

[14]  Yong Tang,et al.  Learning to rank with document ranks and scores , 2011, Knowl. Based Syst..

[15]  Ping-I Chen,et al.  Word AdHoc Network: Using Google Core Distance to extract the most relevant information , 2011, Knowl. Based Syst..

[16]  Li Zhou,et al.  Sentiment classification for stock news , 2010, 5th International Conference on Pervasive Computing and Applications.

[17]  Gregory Grefenstette,et al.  Validating the Coverage of Lexical Resources for Affect Analysis and Automatically Classifying New Words along Semantic Axes , 2006, Computing Attitude and Affect in Text.

[18]  Mitsuru Ishizuka,et al.  SentiFul: A Lexicon for Sentiment Analysis , 2011, IEEE Transactions on Affective Computing.

[19]  Hongnian Yu,et al.  Mutual information based input feature selection for classification problems , 2012, Decis. Support Syst..

[20]  Frans Coenen,et al.  Data mining techniques for the screening of age-related macular degeneration , 2012, Knowl. Based Syst..

[21]  Tao Wang,et al.  Building Chinese Sentiment Lexicon Based on HowNet , 2011 .

[22]  Carlo Strapparava,et al.  Developing Affective Lexical Resources , 2004, PsychNology J..

[23]  Andrea Esuli,et al.  SENTIWORDNET: A Publicly Available Lexical Resource for Opinion Mining , 2006, LREC.

[24]  Hsin-Hsi Chen,et al.  Mining opinions from the Web: Beyond relevance retrieval , 2007 .

[25]  Chung-Hsien Wu,et al.  HAL-Based Evolutionary Inference for Pattern Induction From Psychiatry Web Resources , 2008, IEEE Transactions on Evolutionary Computation.

[26]  S. Kullback,et al.  Information Theory and Statistics , 1959 .

[27]  Qing Cao,et al.  Exploring determinants of voting for the "helpfulness" of online user reviews: A text mining approach , 2011, Decis. Support Syst..

[28]  Wai Lam,et al.  News Sensitive Stock Trend Prediction , 2002, PAKDD.

[29]  Esmaeil Hadavandi,et al.  Hybridization of evolutionary Levenberg-Marquardt neural networks and data pre-processing for stock market prediction , 2012, Knowl. Based Syst..

[30]  J. Jenkins,et al.  Word association norms , 1964 .

[31]  Xun Liang Neural Network Method to Predict Stock Price Movement Based on Stock Information Entropy , 2006, ISNN.

[32]  Samuel W. K. Chan,et al.  A text-based decision support system for financial sequence prediction , 2011, Decis. Support Syst..

[33]  Freimut Bodendorf,et al.  Warning system for online market research - Identifying critical situations in online opinion formation , 2011, Knowl. Based Syst..

[34]  Pei-Chann Chang,et al.  A Hybrid System Integrating a Wavelet and TSK Fuzzy Rules for Stock Price Forecasting , 2008, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[35]  Hsinchun Chen,et al.  Affect Analysis of Web Forums and Blogs Using Correlation Ensembles , 2008, IEEE Transactions on Knowledge and Data Engineering.

[36]  Michael L. Littman,et al.  Measuring praise and criticism: Inference of semantic orientation from association , 2003, TOIS.

[37]  M. Bradley,et al.  Affective Normsfor English Words (ANEW): Stimuli, instruction manual and affective ratings (Tech Report C-1) , 1999 .

[38]  Houfeng Wang,et al.  Build Chinese Emotion Lexicons Using A Graph-based Algorithm and Multiple Resources , 2010, COLING.

[39]  Yajie Hu,et al.  Lyric-based Song Emotion Detection with Affective Lexicon and Fuzzy Clustering Method , 2009, ISMIR.

[40]  Zengyou He,et al.  G-ANMI: A mutual information based genetic clustering algorithm for categorical data , 2010, Knowl. Based Syst..

[41]  Liang-Chih Yu,et al.  Mining association language patterns using a distributional semantic model for negative life event classification , 2011, J. Biomed. Informatics.

[42]  Khurshid Ahmad,et al.  Sentiment Polarity Identification in Financial News: A Cohesion-based Approach , 2007, ACL.

[43]  Pei-Chann Chang,et al.  A dynamic threshold decision system for stock trading signal detection , 2011, Appl. Soft Comput..

[44]  Jan Muntermann,et al.  An intraday market risk management approach based on textual analysis , 2011, Decis. Support Syst..

[45]  Chung-Hsien Wu,et al.  Using Semantic Dependencies to Mine Depressive Symptoms from Consultation Records , 2005, IEEE Intell. Syst..