Predicting Indian Stock Market Using the Psycho-Linguistic Features of Financial News

Financial forecasting using news articles is an emerging field. In this paper, we proposed hybrid intelligent models for stock market prediction using the psycholinguistic variables (LIWC and TAALES) extracted from news articles as predictor variables. For prediction purpose, we employed various intelligent techniques such as Multilayer Perceptron, Group Method of Data Handling (GMDH), General Regression Neural Network (GRNN), Random Forest, Quantile Regression Random Forest, Classification and regression tree and Support Vector Regression. We experimented on the data of 12 companies’ stocks, which are listed in Bombay Stock Exchange. We employed Chi squared and maximum relevance and minimum redundancy feature selection techniques on the psycho-linguistic features obtained from the news articles etc. After extensive experimentation, using Diebold-Mariano test, we conclude that GMDH and GRNN are statistically the best techniques in that order with respect to the MAPE and NRMSE values.

[1]  Vadlamani Ravi,et al.  Software cost estimation using computational intelligence techniques , 2009, 2009 World Congress on Nature & Biologically Inspired Computing (NaBIC).

[2]  Vadlamani Ravi,et al.  Soft computing based imputation and hybrid data and text mining: The case of predicting the severity of phishing alerts , 2012, Expert Syst. Appl..

[3]  Dipti Srinivasan,et al.  Energy demand prediction using GMDH networks , 2008, Neurocomputing.

[4]  Júlio C. Nievola,et al.  Predicting published news effect in the Brazilian stock market , 2012, Expert Syst. Appl..

[5]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[6]  Hong Miao,et al.  Currency jumps, cojumps and the role of macro news , 2014 .

[7]  Vlado Keselj,et al.  Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports , 2009, Canadian Conference on AI.

[8]  R. Brereton,et al.  Support vector machines for classification and regression. , 2010, The Analyst.

[9]  Wang Ling,et al.  Estimation of Missing Values Using a Weighted K-Nearest Neighbors Algorithm , 2009, 2009 International Conference on Environmental Science and Information Application Technology.

[10]  Vadlamani Ravi,et al.  A new online data imputation method based on general regression auto associative neural network , 2014, Neurocomputing.

[11]  Vadlamani Ravi,et al.  Support Vector-Quantile Regression Random Forest Hybrid for Regression Problems , 2014, MIWAI.

[12]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  F. Diebold,et al.  Comparing Predictive Accuracy , 1994, Business Cycles.

[14]  Hsinchun Chen,et al.  A Tensor-Based Information Framework for Predicting the Stock Market , 2016, ACM Trans. Inf. Syst..

[15]  Naren Ramakrishnan,et al.  Forex-foreteller: currency trend modeling using news articles , 2013, KDD.

[16]  Johan Bollen,et al.  Twitter mood predicts the stock market , 2010, J. Comput. Sci..

[17]  Moshe Koppel,et al.  Good News or Bad News? Let the Market Decide , 2006, Computing Attitude and Affect in Text.

[18]  Ashwini Saini,et al.  Predicting the Unpredictable: An Application of Machine Learning Algorithms in Indian Stock Market , 2019, Annals of Data Science.

[19]  Chao Wang,et al.  Improving Stock Market Prediction by Integrating Both Market News and Stock Prices , 2011, DEXA.

[20]  Vadlamani Ravi,et al.  Hybrid intelligent systems for predicting software reliability , 2013, Appl. Soft Comput..

[21]  Michael McGill,et al.  Introduction to Modern Information Retrieval , 1983 .

[22]  David L. Olson,et al.  Introduction to Business Data Mining , 2005 .

[23]  Amir F. Atiya,et al.  Introduction to financial forecasting , 1996, Applied Intelligence.

[24]  Abraham Kandel,et al.  ADMIRAL: A Data Mining Based Financial Trading System , 2007, 2007 IEEE Symposium on Computational Intelligence and Data Mining.

[25]  Vadlamani Ravi,et al.  FOREX Rate Prediction Using Chaos, Neural Network and Particle Swarm Optimization , 2014, ICSI.

[26]  Nigel Collier,et al.  An Experiment in Integrating Sentiment Features for Tech Stock Prediction in Twitter , 2012 .

[27]  Khadjeh NassirtoussiArman,et al.  Text mining of news-headlines for FOREX market prediction , 2015 .

[28]  Ammar Belatreche,et al.  Forecasting movements of health-care stock prices based on different categories of news articles using multiple kernel learning , 2016, Decis. Support Syst..

[29]  Xiaotie Deng,et al.  Empirical analysis: stock market prediction via extreme learning machine , 2014, Neural Computing and Applications.

[30]  Kumar Ravi,et al.  A novel automatic satire and irony detection using ensembled feature selection and data mining , 2017, Knowl. Based Syst..

[31]  Scott A. Crossley,et al.  Automatically Assessing Lexical Sophistication: Indices, Tools, Findings, and Application , 2015 .

[32]  Vadlamani Ravi,et al.  A novel hybrid undersampling method for mining unbalanced datasets in banking and insurance , 2015, Eng. Appl. Artif. Intell..

[33]  Leo Breiman,et al.  Classification and Regression Trees , 1984 .

[34]  Desheng Dash Wu,et al.  Power load forecasting using support vector machine and ant colony optimization , 2010, Expert Syst. Appl..

[35]  Jianping Li,et al.  Optimization Based Data Mining: Theory and Applications , 2011, Advanced Information and Knowledge Processing.

[36]  Manas Ranjan Patra,et al.  Web-services classification using intelligent techniques , 2010, Expert Syst. Appl..

[37]  Kalyanmoy Deb,et al.  Elitist Quantum-Inspired Differential Evolution Based Wrapper for Feature Subset Selection , 2015, MIWAI.

[38]  James L. McClelland,et al.  Parallel distributed processing: explorations in the microstructure of cognition, vol. 1: foundations , 1986 .

[39]  Chris H. Q. Ding,et al.  Minimum redundancy feature selection from microarray gene expression data , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[40]  Mayank Pandey,et al.  Text and Data Mining to Detect Phishing Websites and Spam Emails , 2013, SEMCCO.

[41]  Saman K. Halgamuge,et al.  Combining News and Technical Indicators in Daily Stock Price Trends Prediction , 2007, ISNN.

[42]  Benito E. Flores,et al.  A pragmatic view of accuracy measurement in forecasting , 1986 .

[43]  Hao Chen,et al.  Refined Diebold-Mariano Test Methods for the Evaluation of Wind Power Forecasting Models , 2014 .

[44]  Dirk Neumann,et al.  Automated news reading: Stock price prediction based on financial news using context-capturing features , 2013, Decis. Support Syst..

[45]  Vadlamani Ravi,et al.  Counter propagation auto-associative neural network based data imputation , 2015, Inf. Sci..

[46]  Geoffrey E. Hinton,et al.  Learning internal representations by error propagation , 1986 .

[47]  Donald F. Specht,et al.  A general regression neural network , 1991, IEEE Trans. Neural Networks.

[48]  Raymond K. Wong,et al.  Currency Exchange Rate Forecasting From News Headlines , 2002, Australasian Database Conference.

[49]  Nicolai Meinshausen,et al.  Quantile Regression Forests , 2006, J. Mach. Learn. Res..

[50]  David D. Jensen,et al.  Mining of Concurrent Text and Time Series , 2008 .

[51]  J. Pennebaker,et al.  The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods , 2010 .

[52]  Jan Muntermann,et al.  An intraday market risk management approach based on textual analysis , 2011, Decis. Support Syst..

[53]  Ying Wah Teh,et al.  Text mining of news-headlines for FOREX market prediction: A Multi-layer Dimension Reduction Algorithm with semantics and sentiment , 2015, Expert Syst. Appl..

[54]  How China Deals with Big Data , 2017 .

[55]  Lipika Dey,et al.  Mining Financial News for Major Events and Their Impacts on the Market , 2008, 2008 IEEE/WIC/ACM International Conference on Web Intelligence and Intelligent Agent Technology.

[56]  Vadlamani Ravi,et al.  Financial distress prediction in banks using Group Method of Data Handling neural network, counter propagation neural network and fuzzy ARTMAP , 2010, Knowl. Based Syst..

[57]  Lai-Wan Chan,et al.  Support Vector Machine Regression for Volatile Stock Market Prediction , 2002, IDEAL.

[58]  Rohini K. Srihari,et al.  Feature selection for text categorization on imbalanced data , 2004, SKDD.

[59]  Wanbin Wang,et al.  Predicting Stock Price Movements with News Sentiment: An Artificial Neural Network Approach , 2016 .

[60]  Durga Toshniwal,et al.  Missing Value Imputation Based on K-Mean Clustering with Weighted Distance , 2010, IC3.

[61]  Vadlamani Ravi,et al.  Evolutionary computing applied to customer relationship management: A survey , 2016, Eng. Appl. Artif. Intell..

[62]  James D. Thomas Integrating Genetic Algorithms and Text Learning for Financial Prediction , 2000 .

[63]  Ryan L. Boyd,et al.  The Development and Psychometric Properties of LIWC2015 , 2015 .

[64]  Samuel W. K. Chan,et al.  A text-based decision support system for financial sequence prediction , 2011, Decis. Support Syst..

[65]  Tom Downs,et al.  Evaluation of support vector machine based forecasting tool in electricity price forecasting for Australian national electricity market participants , 2002 .

[66]  Jan-Ming Ho,et al.  Travel time prediction with support vector regression , 2003, Proceedings of the 2003 IEEE International Conference on Intelligent Transportation Systems.

[67]  Michel Verleysen,et al.  K nearest neighbours with mutual information for simultaneous classification and missing data imputation , 2009, Neurocomputing.

[68]  D. Rubin,et al.  Statistical Analysis with Missing Data , 1988 .

[69]  Scott Crossley,et al.  The tool for the automatic analysis of lexical sophistication (TAALES): version 2.0 , 2017, Behavior Research Methods.