A new random subspace method incorporating sentiment and textual information for financial distress prediction

Abstract Financial distress prediction aims to provide the early warning signals for corporate governance, which has been widely recognized as a promising way to reduce financial losses. However, non-financial predictive information, such as sentiment and textual information, and the class-imbalance problem were often neglected in previous research. Therefore, incorporating sentiment and textual information into a random subspace method (IST-RS), is proposed for financial distress prediction. Sentiment and textual features are extracted as non-financial features and further integrated with the conventional financial features. To deal with the high-dimension and class-imbalance problems, the ensemble random subspace method is adopted and improved by fusing the lasso regularized sparse method. Experiments on the dataset derived from the China Security Market Accounting Research Database (CSMAR) were conducted to verify the effectiveness and feasibility of IST-RS. The results indicate that the proposed approach enables the performance of financial distress prediction to be significantly improved. Moreover, the proposed approach has outperformed the benchmark methods on high-dimensional datasets, which demonstrates that is suitable for simultaneously solving the high-dimensionality and class-imbalance problems in financial distress prediction.

[1]  Jakub M. Tomczak,et al.  Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction , 2016, Expert Syst. Appl..

[2]  Mu-Yen Chen,et al.  Prediction of corporate financial distress: an application of the America banking industry , 2012, Neural Computing and Applications.

[3]  Masashi Sugiyama,et al.  High-Dimensional Feature Selection by Feature-Wise Kernelized Lasso , 2012, Neural Computation.

[4]  Liu Xiao,et al.  Adapted ensemble classification algorithm based on multiple classifier system and feature selection for classifying multi-class imbalanced data , 2016 .

[5]  Philippe du Jardin,et al.  A two-stage classification technique for bankruptcy prediction , 2016, Eur. J. Oper. Res..

[6]  Hsinchun Chen,et al.  Selecting Attributes for Sentiment Classification Using Feature Relation Networks , 2011, IEEE Transactions on Knowledge and Data Engineering.

[7]  Ruibin Geng,et al.  Prediction of financial distress: An empirical study of listed Chinese companies using data mining , 2015, Eur. J. Oper. Res..

[8]  Edward I. Altman,et al.  FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY , 1968 .

[9]  Cindy Yoshiko Shirata,et al.  Extracting Key Phrases as Predictors of Corporate Bankruptcy: Empirical Analysis of Annual Reports by Text Mining , 2011 .

[10]  P. Hájek,et al.  Forecasting corporate financial performance using sentiment in annual reports for stakeholders’ decision-making , 2014 .

[11]  James A. Ohlson FINANCIAL RATIOS AND THE PROBABILISTIC PREDICTION OF BANKRUPTCY , 1980 .

[12]  Jian Ma,et al.  An improved boosting based on feature selection for corporate bankruptcy prediction , 2014, Expert Syst. Appl..

[13]  Kyung-shik Shin,et al.  A genetic algorithm application in bankruptcy prediction modeling , 2002, Expert Syst. Appl..

[14]  Fumiko Takeda,et al.  Characteristics and stock prices of firms flamed on the Internet: The evidence from Japan , 2016, Electron. Commer. Res. Appl..

[15]  Jonathan Crook,et al.  Chinese companies distress prediction: an application of data envelopment analysis , 2014, J. Oper. Res. Soc..

[16]  Praveen Pathak,et al.  Making words work: Using financial text as a predictor of financial events , 2010, Decis. Support Syst..

[17]  Carlos Serrano-Cinca,et al.  Partial Least Square Discriminant Analysis for bankruptcy prediction , 2013, Decis. Support Syst..

[18]  Sotiris Kotsiantis,et al.  Text Classification Using Machine Learning Techniques , 2005 .

[19]  Jian Ma,et al.  A comparative assessment of ensemble learning for credit scoring , 2011, Expert Syst. Appl..

[20]  Hsinchun Chen,et al.  Gender Classification for Web Forums , 2011, IEEE Transactions on Systems, Man, and Cybernetics - Part A: Systems and Humans.

[21]  Jian Ma,et al.  Igf-bagging: Information gain based feature selection for bagging , 2011 .

[22]  Hui Li,et al.  Listed companies' financial distress prediction based on weighted majority voting combination of multiple classifiers , 2008, Expert Syst. Appl..

[23]  Kyung-shik Shin,et al.  Bankruptcy Prediction Modeling Using Qualitative Information Based on Big Data Analytics , 2016 .

[24]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[25]  R. C. West A factor-analytic approach to bank condition , 1985 .

[26]  Hsinchun Chen,et al.  Sentiment analysis in multiple languages: Feature selection for opinion classification in Web forums , 2008, TOIS.

[27]  Kun Chen,et al.  Measuring social influence for firm-level financial performance , 2016, Electron. Commer. Res. Appl..

[28]  W. Beaver Financial Ratios As Predictors Of Failure , 1966 .

[29]  Tengke Xiong,et al.  Personal bankruptcy prediction by mining credit card data , 2013, Expert Syst. Appl..

[30]  Yong Hu,et al.  Concept drift mining of portfolio selection factors in stock market , 2015, Electron. Commer. Res. Appl..

[31]  Johan A. K. Suykens,et al.  Faculteit Economie En Bedrijfskunde Hoveniersberg 24 B-9000 Gent Bayesian Kernel-based Classification for Financial Distress Detection Dirk Van Den Poel 4 Bayesian Kernel Based Classification for Financial Distress Detection , 2022 .

[32]  Sven F. Crone,et al.  Instance sampling in credit scoring: An empirical study of sample size and balancing , 2012 .

[33]  Terry Windeatt,et al.  Decision Tree Simplification For Classifier Ensembles , 2004, Int. J. Pattern Recognit. Artif. Intell..

[34]  Sofus A. Macskassy,et al.  More than Words: Quantifying Language to Measure Firms' Fundamentals the Authors Are Grateful for Assiduous Research Assistance from Jie Cao and Shuming Liu. We Appreciate Helpful Comments From , 2007 .

[35]  C. Zavgren ASSESSING THE VULNERABILITY TO FAILURE OF AMERICAN INDUSTRIAL FIRMS: A LOGISTIC ANALYSIS , 1985 .

[36]  Hui Li,et al.  Forecasting business failure: The use of nearest-neighbour support vectors and correcting imbalanced samples – Evidence from the Chinese hotel industry , 2012 .

[37]  Ning Chen,et al.  Financial credit risk assessment: a recent review , 2015, Artificial Intelligence Review.

[38]  Feng Li Annual Report Readability, Current Earnings, and Earnings Persistence , 2008 .

[39]  Padmini Srinivasan,et al.  On the predictive ability of narrative disclosures in annual reports , 2010, Eur. J. Oper. Res..

[40]  Constantin Zopounidis,et al.  A survey of business failures with an emphasis on prediction methods and industrial applications , 1996 .

[41]  Werner Antweiler,et al.  Is All that Talk Just Noise? The Information Content of Internet Stock Message Boards , 2001 .

[42]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[43]  Loris Nanni,et al.  An experimental comparison of ensemble of classifiers for bankruptcy prediction and credit scoring , 2009, Expert Syst. Appl..

[44]  Christophe Mues,et al.  An experimental comparison of classification algorithms for imbalanced credit scoring data sets , 2012, Expert Syst. Appl..

[45]  Lu Wang,et al.  Business failure prediction based on two-stage selective ensemble with manifold learning algorithm and kernel-based fuzzy self-organizing map , 2017, Knowl. Based Syst..

[46]  David L. Olson,et al.  Comparative analysis of data mining methods for bankruptcy prediction , 2012, Decis. Support Syst..

[47]  Ligang Zhou,et al.  Predicting the listing status of Chinese listed companies with multi-class classification models , 2016, Inf. Sci..

[48]  Daniel Martin,et al.  Early warning of bank failure: A logit regression approach , 1977 .

[49]  Kin Keung Lai,et al.  Bankruptcy prediction using SVM models with a new approach to combine features selection and parameter optimisation , 2014, Int. J. Syst. Sci..

[50]  Yijing Li,et al.  Learning from class-imbalanced data: Review of methods and applications , 2017, Expert Syst. Appl..

[51]  Gordon V. Karels,et al.  Multivariate Normality and Forecasting of Business Bankruptcy , 1987 .

[52]  David C. Yen,et al.  A comparative study of classifier ensembles for bankruptcy prediction , 2014, Appl. Soft Comput..

[53]  Hui Li,et al.  AdaBoost ensemble for financial distress prediction: An empirical comparison with data from Chinese listed companies , 2011, Expert Syst. Appl..

[54]  Lidia Ogiela,et al.  Intelligent techniques for secure financial management in cloud computing , 2015, Electron. Commer. Res. Appl..

[55]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[56]  Soushan Wu,et al.  Credit rating analysis with support vector machines and neural networks: a market comparative study , 2004, Decis. Support Syst..

[57]  Hsinchun Chen,et al.  Automatic online news monitoring and classification for syndromic surveillance , 2009, Decision Support Systems.

[58]  Hui Li,et al.  Gaussian case-based reasoning for business failure prediction with empirical data in China , 2009, Inf. Sci..

[59]  Ning Chen,et al.  A genetic algorithm-based approach to cost-sensitive bankruptcy prediction , 2011, Expert Syst. Appl..

[60]  Zhu Zhang,et al.  POS-RS: A Random Subspace method for sentiment classification based on part-of-speech analysis , 2015, Inf. Process. Manag..

[61]  Jae Kwon Bae,et al.  Predicting financial distress of the South Korean manufacturing industries , 2012, Expert Syst. Appl..

[62]  Hannu Vanharanta,et al.  The language of quarterly reports as an indicator of change in the company's financial status , 2005, Inf. Manag..

[63]  Anil K. Bera,et al.  Efficient tests for normality, homoscedasticity and serial independence of regression residuals , 1980 .

[64]  Tim Loughran,et al.  When is a Liability not a Liability? Textual Analysis, Dictionaries, and 10-Ks , 2010 .

[65]  W. Bailey,et al.  Bank Loans with Chinese Characteristics: Some Evidence on Inside Debt in a State-Controlled Banking System , 2011, Journal of Financial and Quantitative Analysis.

[66]  Diego Andina,et al.  Multiple proportion case-basing driven CBRE and its application in the evaluation of possible failure of firms , 2013, Int. J. Syst. Sci..

[67]  J Vaishnavi.,et al.  Bankruptcy Prediction using SVM and Hybrid SVM Survey , 2011 .

[68]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[69]  Hui Li,et al.  Data mining method for listed companies' financial distress prediction , 2008, Knowl. Based Syst..

[70]  Yongsheng Ding,et al.  Forecasting financial condition of Chinese listed companies based on support vector machine , 2008, Expert Syst. Appl..

[71]  Vadlamani Ravi,et al.  Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review , 2007, Eur. J. Oper. Res..

[72]  David A. Elizondo,et al.  Bankruptcy forecasting: An empirical comparison of AdaBoost and neural networks , 2008, Decis. Support Syst..

[73]  Tin Kam Ho,et al.  The Random Subspace Method for Constructing Decision Forests , 1998, IEEE Trans. Pattern Anal. Mach. Intell..