Predicting firm failure in the software industry

Firm failure rate in the software industry is significantly higher than other industries. Due to the wide use of software products and services, failure in the software industry has implications on the industry itself as well as the economy at the local, national and global levels. This study compares the classification performance of thirteen approaches in terms of predicting firm failure in the US software industry. Seven measures are used to evaluate the classifiers’ performance. We use synthetic minority oversampling technique (SMOTE), SMOTEBoost and SMOTEBagging to account for the data imbalance issue. In order to give managers enough time to develop strategies and take the necessary actions to reduce the likelihood of failing, we use 20 financial indicators collected 4 years before the last available date about each firm. Our findings show that embedding SMOTE into boosting and bagging algorithms is better than preprocessing data using SMOTE before learning the classifier. According to the sensitivity analysis, research and development expense is the most significant predictor of firm failure followed by net sales and total revenue. Our results can be used by managers as a decision support tool to identify high-risk firms at an early stage and take the necessary actions to prevent a firm from failing. The early prediction of firm failure will allow software firms to modularize their products or services into specific “features” and offer them as “digital services” using new business models or combine these services with partner firms’ services to create new products and address evolving customer expectations. Moreover, the early prediction of firm failure in the software industry calls on firms, both new and those in the growth stage, to componentize their design for adaptability and to build agility in the way firms use their resource mix to address both market gaps as well as operational gaps.

[1]  Charles E. McCulloch,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival, and Repeated Measures Models , 2005 .

[2]  Haibo He,et al.  Learning from Imbalanced Data , 2009, IEEE Transactions on Knowledge and Data Engineering.

[3]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[4]  Francisco Herrera,et al.  An insight into classification with imbalanced data: Empirical results and current trends on using data intrinsic characteristics , 2013, Inf. Sci..

[5]  Sofie Balcaen,et al.  35 years of studies on business failure: an overview of the classic statistical methodologies and their related problems , 2006 .

[6]  Ramesh Sharda,et al.  Bankruptcy prediction using neural networks , 1994, Decis. Support Syst..

[7]  Robert F. Lusch,et al.  Service Innovation: A Service-Dominant Logic Perspective , 2015, MIS Q..

[8]  Nitesh V. Chawla,et al.  SMOTEBoost: Improving Prediction of the Minority Class in Boosting , 2003, PKDD.

[9]  Carlos Cristiano Hasenclever Borges,et al.  A Semi-deterministic Ensemble Strategy ForImbalanced Datasets (SDEID) Applied ToBankruptcy Prediction , 2008 .

[10]  Jodi L. Gissel,et al.  A Review of Bankruptcy Prediction Studies: 1930 to Present , 2006 .

[11]  Vadlamani Ravi,et al.  Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review , 2007, Eur. J. Oper. Res..

[12]  Hui Li,et al.  Business failure prediction using hybrid2 case-based reasoning (H2CBR) , 2010, Comput. Oper. Res..

[13]  Ruibin Geng,et al.  Prediction of financial distress: An empirical study of listed Chinese companies using data mining , 2015, Eur. J. Oper. Res..

[14]  Kuldeep Kumar,et al.  Business failure prediction using decision trees , 2010 .

[15]  Xin Yao,et al.  Diversity analysis on imbalanced data sets by using ensemble models , 2009, 2009 IEEE Symposium on Computational Intelligence and Data Mining.

[16]  D. Cox Regression Models and Life-Tables , 1972 .

[17]  Jakub M. Tomczak,et al.  Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction , 2016, Expert Syst. Appl..

[18]  Asil Oztekin,et al.  A data analytic approach to forecasting daily stock returns in an emerging market , 2016, Eur. J. Oper. Res..

[19]  Edward I. Altman,et al.  FINANCIAL RATIOS, DISCRIMINANT ANALYSIS AND THE PREDICTION OF CORPORATE BANKRUPTCY , 1968 .

[20]  Ashok N. Srivastava,et al.  Data Mining: Concepts, Models, Methods, and Algorithms , 2005, J. Comput. Inf. Sci. Eng..

[21]  Rudrajeet Pal,et al.  Business health characterization: A hybrid regression and support vector machine analysis , 2016, Expert Syst. Appl..

[22]  Yusuf Yaslan,et al.  Ensemble based classifiers using dictionary learning , 2016, 2016 International Conference on Systems, Signals and Image Processing (IWSSIP).

[23]  Erkam Güresen,et al.  Developing an early warning system to predict currency crises , 2014, Eur. J. Oper. Res..

[24]  Mohan Tanniru,et al.  Classifying readmissions to a cardiac intensive care unit , 2018, Ann. Oper. Res..

[25]  Eve D. Rosenzweig,et al.  Examining the Influence of Operational Intellectual Capital on Capabilities and Performance , 2007, Manuf. Serv. Oper. Manag..

[26]  Anupam Agrawal,et al.  Vision based hand gesture recognition for human computer interaction: a survey , 2012, Artificial Intelligence Review.

[27]  Tony R. Martinez,et al.  Decision Tree Ensemble: Small Heterogeneous Is Better Than Large Homogeneous , 2008, 2008 Seventh International Conference on Machine Learning and Applications.

[28]  Dae-Ki Kang,et al.  Ensemble with neural networks for bankruptcy prediction , 2010, Expert Syst. Appl..

[29]  Ludmila I. Kuncheva,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2004 .

[30]  Hui Li,et al.  Imbalance-oriented SVM methods for financial distress prediction: a comparative study among the new SB-SVM-ensemble method and traditional methods , 2014, J. Oper. Res. Soc..

[31]  David West,et al.  Neural network ensemble strategies for financial decision applications , 2005, Comput. Oper. Res..

[32]  Ning Chen,et al.  A genetic algorithm-based approach to cost-sensitive bankruptcy prediction , 2011, Expert Syst. Appl..

[33]  D. Kleinbaum,et al.  Applied Regression Analysis and Other Multivariate Methods , 1978 .

[34]  Efstathios Kirkos Assessing methodologies for intelligent bankruptcy prediction , 2012, Artificial Intelligence Review.

[35]  James A. Ohlson FINANCIAL RATIOS AND THE PROBABILISTIC PREDICTION OF BANKRUPTCY , 1980 .

[36]  Remco R. Bouckaert,et al.  Bayesian network classifiers in Weka , 2004 .

[37]  Chih-Fong Tsai,et al.  Using neural network ensembles for bankruptcy prediction and credit scoring , 2008, Expert Syst. Appl..

[38]  D. Edwards Data Mining: Concepts, Models, Methods, and Algorithms , 2003 .

[39]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[40]  Michael Y. Hu,et al.  Artificial neural networks in bankruptcy prediction: General framework and cross-validation analysis , 1999, Eur. J. Oper. Res..

[41]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[42]  María N. Moreno García,et al.  Machine Learning Methods for Mortality Prediction of Polytraumatized Patients in Intensive Care Units - Dealing with Imbalanced and High-Dimensional Data , 2014, IDEAL.

[43]  Armando Vieira,et al.  Improving bankruptcy prediction with Hidden Layer Learning Vector Quantization , 2006 .

[44]  Selim Zaim,et al.  A machine learning-based usability evaluation method for eLearning systems , 2013, Decis. Support Syst..

[45]  Araceli Sanchis,et al.  Generating ensembles of heterogeneous classifiers using Stacked Generalization , 2015, WIREs Data Mining Knowl. Discov..

[46]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[47]  Qinghua Huang,et al.  Predicting financial distress and corporate failure: A review from the state-of-the-art definitions, modeling, sampling, and featuring approaches , 2014, Knowl. Based Syst..

[48]  Zhi Xiao,et al.  The prediction for listed companies' financial distress by using multiple prediction methods with rough set and Dempster-Shafer evidence theory , 2012, Knowl. Based Syst..

[49]  Nitesh V. Chawla,et al.  Learning from Imbalanced Data: Evaluation Matters , 2012 .

[50]  Marco S. Giarratana,et al.  Product Strategies and Survival in Schumpeterian Environments: Evidence from the US Security Software Industry , 2007 .

[51]  Subhash C. Bagui,et al.  Combining Pattern Classifiers: Methods and Algorithms , 2005, Technometrics.

[52]  Richard Goldstein,et al.  Regression Methods in Biostatistics: Linear, Logistic, Survival and Repeated Measures Models , 2006, Technometrics.

[53]  Prakash P. Shenoy,et al.  Using Bayesian networks for bankruptcy prediction: Some methodological issues , 2007, Eur. J. Oper. Res..

[54]  Erran Carmel,et al.  Customer-developer links in software development , 1995, CACM.

[55]  Shanling Li,et al.  Why Do Software Firms Fail? Capabilities, Competitive Actions, and Firm Survival in the Software Industry from 1995 to 2007 , 2010, Inf. Syst. Res..

[56]  Barry L. Bayus,et al.  The Role of Pre-Entry Experience, Entry Timing and Product Technology Strategies in Explaining Firm Survival , 2007, Manag. Sci..

[57]  David H. Wolpert,et al.  Stacked generalization , 1992, Neural Networks.

[58]  Maurice Bruynooghe,et al.  Machine learning methods for prediction in intensive care , 2006 .

[59]  Hamido Fujita,et al.  Imbalanced enterprise credit evaluation with DTE-SBD: Decision tree ensemble based on SMOTE and bagging with differentiated sampling rates , 2018, Inf. Sci..

[60]  Ligang Zhou,et al.  Performance of corporate bankruptcy prediction models on imbalanced dataset: The effect of sampling methods , 2013, Knowl. Based Syst..

[61]  Richard Schmalensee,et al.  Antitrust Issues in Schumpeterian Industries , 2000 .

[62]  A. Saltelli,et al.  Making best use of model evaluations to compute sensitivity indices , 2002 .

[63]  John Aston,et al.  An evaluation of Altman's Z-score using cash flow ratio to predict corporate failure amid the recent financial crisis: Evidence from the UK , 2016 .