Inducing stock market lexicons from disparate Chinese texts

The purpose of this paper is to propose a methodology to construct a stock market sentiment lexicon by incorporating domain-specific knowledge extracted from diverse Chinese media outlets.,This paper presents a novel method to automatically generate financial lexicons using a unique data set that comprises news articles, analyst reports and social media. Specifically, a novel method based on keyword extraction is used to build a high-quality seed lexicon and an ensemble mechanism is developed to integrate the knowledge derived from distinct language sources. Meanwhile, two different methods, Pointwise Mutual Information and Word2vec, are applied to capture word associations. Finally, an evaluation procedure is performed to validate the effectiveness of the method compared with four traditional lexicons.,The experimental results from the three real-world testing data sets show that the ensemble lexicons can significantly improve sentiment classification performance compared with the four baseline lexicons, suggesting the usefulness of leveraging knowledge derived from diverse media in domain-specific lexicon generation and corresponding sentiment analysis tasks.,This work appears to be the first to construct financial sentiment lexicons from over 2m posts and headlines collected from more than one language source. Furthermore, the authors believe that the data set established in this study is one of the largest corpora used for Chinese stock market lexicon acquisition. This work is valuable to extract collective sentiment from multiple media sources and provide decision-making support for stock market participants.

[1]  Kun Guo,et al.  Can investor sentiment be used to predict the stock price? Dynamic analysis based on China stock market , 2017 .

[2]  Paulo Cortez,et al.  The impact of microblogging data for stock market prediction: Using Twitter to predict returns, volatility, trading volume and survey sentiment indices , 2017 .

[3]  Weiguo Fan,et al.  Identifying domain relevant user generated content through noise reduction: a test in a Chinese stock discussion forum , 2017 .

[4]  Lei Zhang,et al.  A Survey of Opinion Mining and Sentiment Analysis , 2012, Mining Text Data.

[5]  Tahir M. Nisar,et al.  Brand interactions and social media: Enhancing user loyalty through social networking sites , 2016, Comput. Hum. Behav..

[6]  Bill McDonald,et al.  Textual Analysis in Accounting and Finance: A Survey , 2016 .

[7]  Shailendra Kumar,et al.  Stock market response to information diffusion through internet sources: A literature review , 2019, Int. J. Inf. Manag..

[8]  Jennifer Jie Xu,et al.  Business Intelligence in Blogs: Understanding Consumer Interactions and Communities , 2012, MIS Q..

[9]  Yang Yu,et al.  The impact of social and conventional media on firm equity value: A sentiment analysis approach , 2013, Decis. Support Syst..

[10]  Maite Taboada,et al.  Lexicon-Based Methods for Sentiment Analysis , 2011, CL.

[11]  Colm Kearney,et al.  Textual Sentiment in Finance: A Survey of Methods and Models , 2013 .

[12]  G. Enli Twitter as arena for the authentic outsider: exploring the social media campaigns of Trump and Clinton in the 2016 US presidential election , 2017 .

[13]  Wei Zhang,et al.  Does government information release really matter in regulating contagion-evolution of negative emotion during public emergencies? From the perspective of cognitive big data analytics , 2020, Int. J. Inf. Manag..

[14]  Kenneth Ward Church,et al.  Word Association Norms, Mutual Information, and Lexicography , 1989, ACL.

[15]  Paulo Cortez,et al.  Stock market sentiment lexicon acquisition using microblogging data and statistical measures , 2016, Decis. Support Syst..

[16]  Huimin Zhao,et al.  Adapting sentiment lexicons to domain-specific social media texts , 2017, Decis. Support Syst..

[17]  Ronen Feldman,et al.  Techniques and applications for sentiment analysis , 2013, CACM.

[18]  Johan Bollen,et al.  Automatic Construction of Financial Semantic Orientation Lexicon from Large-Scale Chinese News Corpus , 2014 .

[19]  Starr Roxanne Hiltz,et al.  Identifying Opportunities for Valuable Encounters: Toward Context-Aware Social Matching Systems , 2015, TOIS.

[20]  João Francisco Valiati,et al.  Document-level sentiment classification: An empirical comparison between SVM and ANN , 2013, Expert Syst. Appl..

[21]  Yingyi Zhang,et al.  Using multiple Web resources and inference rules to classify Chinese word semantic relation , 2018 .

[22]  David Brumley,et al.  Automatic exploit generation , 2014, CACM.

[23]  Xiaoquan Zhang,et al.  Impact of Wikipedia on Market Information Environment: Evidence on Management Disclosure and Investor Reaction , 2013, MIS Q..

[24]  Yan Chen,et al.  Web Media and Stock Markets : A Survey and Future Directions from a Big Data Perspective , 2018, IEEE Transactions on Knowledge and Data Engineering.

[25]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[26]  Ling Liu,et al.  The effect of news and public mood on stock movements , 2014, Inf. Sci..

[27]  Seong Joon Yoo,et al.  Senti-lexicon and improved Naïve Bayes algorithms for sentiment analysis of restaurant reviews , 2012, Expert Syst. Appl..

[28]  Doug Terry,et al.  Replicated data consistency explained through baseball , 2013, CACM.

[29]  Hsinchun Chen,et al.  Evaluating sentiment in financial news articles , 2012, Decis. Support Syst..

[30]  Qiujun Lan,et al.  Characters-based sentiment identification method for short and informal Chinese text , 2017 .

[31]  Tim Loughran,et al.  Textual Analysis in Accounting and Finance: A Survey: TEXTUAL ANALYSIS IN ACCOUNTING AND FINANCE , 2016 .

[32]  C. Veloutsou,et al.  Consumer engagement in online brand communities: a social media perspective , 2015 .

[33]  Stefan Feuerriegel,et al.  Negation scope detection in sentiment analysis: Decision support for news-driven trading , 2016, Decis. Support Syst..

[34]  Fangzhao Wu,et al.  Towards building a high-quality microblog-specific Chinese sentiment lexicon , 2016, Decis. Support Syst..

[35]  Omer Levy,et al.  word2vec Explained: deriving Mikolov et al.'s negative-sampling word-embedding method , 2014, ArXiv.

[36]  Jong-Seok Lee,et al.  Data-driven integration of multiple sentiment dictionaries for lexicon-based sentiment classification of product reviews , 2014, Knowl. Based Syst..

[37]  Chun Chen,et al.  Opinion Word Expansion and Target Extraction through Double Propagation , 2011, CL.

[38]  Lior Rokach,et al.  Ensemble-based classifiers , 2010, Artificial Intelligence Review.

[39]  Derrick L. Cogburn,et al.  From Networked Nominee to Networked Nation: Examining the Impact of Web 2.0 and Social Media on Political Participation and Civic Engagement in the 2008 Obama Campaign , 2011 .

[40]  Hsinchun Chen,et al.  A Tensor-Based Information Framework for Predicting the Stock Market , 2016, ACM Trans. Inf. Syst..

[41]  Dave Yates,et al.  Emergency knowledge management and social media technologies: A case study of the 2010 Haitian earthquake , 2011, Int. J. Inf. Manag..

[42]  Tomer Geva,et al.  Empirical evaluation of an automated intraday stock recommendation system incorporating both market data and textual news , 2014, Decis. Support Syst..

[43]  Weiguo Fan,et al.  The power of social media analytics , 2014, CACM.