Financial Forecasting Using Character N-Gram Analysis and Readability Scores of Annual Reports

Two novel Natural Language Processing (NLP) classification techniques are applied to the analysis of corporate annual reports in the task of financial forecasting. The hypothesis is that textual content of annual reports contain vital information for assessing the performance of the stock over the next year. The first method is based on character n-gram profiles, which are generated for each annual report, and then labeled based on the CNG classification. The second method draws on a more traditional approach, where readability scores are combined with performance inputs and then supplied to a support vector machine (SVM) for classification. Both methods consistently outperformed a benchmark portfolio, and their combination proved to be even more effective and efficient as the combined models yielded the highest returns with the fewest trades.

[1]  Kyoung-jae Kim,et al.  Financial time series forecasting using support vector machines , 2003, Neurocomputing.

[2]  Shouyang Wang,et al.  Forecasting stock market movement direction with support vector machine , 2005, Comput. Oper. Res..

[3]  I. Song,et al.  Working Set Selection Using Second Order Information for Training Svm, " Complexity-reduced Scheme for Feature Extraction with Linear Discriminant Analysis , 2022 .

[4]  Fuchun Peng,et al.  N-GRAM-BASED AUTHOR PROFILES FOR AUTHORSHIP ATTRIBUTION , 2003 .

[5]  Feng Li Annual Report Readability, Current Earnings, and Earnings Persistence , 2008 .

[6]  Marimuthu Palaniswami,et al.  Stock selection using support vector machines , 2001, IJCNN'01. International Joint Conference on Neural Networks. Proceedings (Cat. No.01CH37222).

[7]  Robert G. Insley,et al.  Performance and Readability: A Comparison of Annual Reports of Profitable and Unprofitable Corporations , 1993 .

[8]  Marc-André Mittermayer,et al.  Forecasting Intraday stock price trends with text mining techniques , 2004, 37th Annual Hawaii International Conference on System Sciences, 2004. Proceedings of the.

[9]  Pegah Falinouss,et al.  Stock trend prediction using news articles : a text mining approach , 2007 .

[10]  Hannu Vanharanta,et al.  Combining data and text mining techniques for analysing financial reports: Research Articles , 2004 .

[11]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[12]  Hsinchun Chen,et al.  Textual Analysis of Stock Market Prediction Using Financial News Articles , 2006, AMCIS.

[13]  Chih-Jen Lin,et al.  Working Set Selection Using Second Order Information for Training Support Vector Machines , 2005, J. Mach. Learn. Res..