A Hybrid Approach Using Oversampling Technique and Cost-Sensitive Learning for Bankruptcy Prediction

The diagnosis of bankruptcy companies becomes extremely important for business owners, banks, governments, securities investors, and economic stakeholders to optimize the profitability as well as to minimize risks of investments. Many studies have been developed for bankruptcy prediction utilizing different machine learning approaches on various datasets around the world. Due to the class imbalance problem occurring in the bankruptcy datasets, several special techniques would be used to improve the prediction performance. Oversampling technique and cost-sensitive learning framework are two common methods for dealing with class imbalance problem. Using oversampling techniques and cost-sensitive learning framework independently also improves predictability. However, for datasets with very small balancing ratios, combining two above techniques will produce the better results. Therefore, this study develops a hybrid approach using oversampling technique and cost-sensitive learning, namely, HAOC for bankruptcy prediction on the Korean Bankruptcy dataset. The first module of HAOC is oversampling module with an optimal balancing ratio found in the first experiment that will give the best overall performance for the validation set. Then, the second module uses the cost-sensitive learning model, namely, CBoost algorithm to bankruptcy prediction. The experimental results show that HAOC will give the best performance value for bankruptcy prediction compared with the existing approaches.

[1]  David A. Cieslak,et al.  Automatically countering imbalance and its empirical relationship to cost , 2008, Data Mining and Knowledge Discovery.

[2]  Tzung-Pei Hong,et al.  Mining frequent itemsets using the N-list and subsume concepts , 2014, Int. J. Mach. Learn. Cybern..

[3]  Benjamin Miranda Tabak,et al.  Inflation targeting and financial stability: Does the quality of institutions matter? , 2018 .

[4]  Duc-Hau Le,et al.  HGPEC: a Cytoscape app for prediction of novel disease-gene and disease-disease associations and evidence collection based on a random walk on heterogeneous network , 2017 .

[5]  Sung Kyung Hong,et al.  Fault Diagnosis and Fault-Tolerant Control Scheme for Quadcopter UAVs with a Total Loss of Actuator , 2019, Energies.

[6]  Ngoc Thanh Nguyen,et al.  A fast and accurate approach for bankruptcy forecasting using squared logistics loss with GPU-based extreme gradient boosting , 2019, Inf. Sci..

[7]  Jakub M. Tomczak,et al.  Ensemble boosted trees with synthetic features generation in application to bankruptcy prediction , 2016, Expert Syst. Appl..

[8]  Feiping Nie,et al.  PurTreeClust: A Clustering Algorithm for Customer Segmentation from Massive Customer Transaction Data , 2018, IEEE Transactions on Knowledge and Data Engineering.

[9]  Kire Trivodaliev,et al.  A review of Internet of Things for smart home: Challenges and solutions , 2017 .

[10]  Witold Pedrycz,et al.  Mining constrained inter-sequence patterns: a novel approach to cope with item constraints , 2018, Applied Intelligence.

[11]  Thiago C. Silva,et al.  Financial Networks , 2018, Complex..

[12]  Francis R. Bach,et al.  Harder, Better, Faster, Stronger Convergence Rates for Least-Squares Regression , 2016, J. Mach. Learn. Res..

[13]  Sung Kyung Hong,et al.  Fault-tolerant Control of Quadcopter UAVs Using Robust Adaptive Sliding Mode Approach , 2018, Energies.

[14]  Sung Wook Baik,et al.  Efficient algorithms for mining top-rank-k erasable patterns using pruning strategies and the subsume concept , 2018, Eng. Appl. Artif. Intell..

[15]  Sung Kyung Hong,et al.  Sliding Mode Thau Observer for Actuator Fault Diagnosis of Quadcopter UAVs , 2018, Applied Sciences.

[16]  Yiming Ma,et al.  Improving an Association Rule Based Classifier , 2000, PKDD.

[17]  Sung Wook Baik,et al.  Oversampling Techniques for Bankruptcy Prediction: Novel Features from a Transaction Dataset , 2018, Symmetry.

[18]  Tan N. Nguyen,et al.  NURBS-based analyses of functionally graded carbon nanotube-reinforced composite shells , 2018, Composite Structures.

[19]  Bay Vo,et al.  Mining top-k co-occurrence items with sequential pattern , 2017, Expert Syst. Appl..

[20]  Hui Han,et al.  Borderline-SMOTE: A New Over-Sampling Method in Imbalanced Data Sets Learning , 2005, ICIC.

[21]  Bay Vo,et al.  EIFDD: An efficient approach for erasable itemset mining of very dense datasets , 2014, Applied Intelligence.

[22]  Gustavo E. A. P. A. Batista,et al.  A study of the behavior of several methods for balancing machine learning training data , 2004, SKDD.

[23]  Juergen Jasperneite,et al.  The Future of Industrial Communication: Automation Networks in the Era of the Internet of Things and Industry 4.0 , 2017, IEEE Industrial Electronics Magazine.

[24]  Qiang Yang,et al.  Test strategies for cost-sensitive decision trees , 2006, IEEE Transactions on Knowledge and Data Engineering.

[25]  Sung Wook Baik,et al.  A New Approach for Construction of Geodemographic Segmentation Model and Prediction Analysis , 2019, Comput. Intell. Neurosci..

[26]  Hung Nguyen-Xuan,et al.  Geometrically nonlinear analysis of functionally graded material plates using an improved moving Kriging meshfree method based on a refined plate theory , 2018, Composite Structures.

[27]  Herbert Kimura,et al.  Machine learning models and bankruptcy prediction , 2017, Expert Syst. Appl..

[28]  Sung Wook Baik,et al.  SPPC: a new tree structure for mining erasable patterns in data streams , 2018, Applied Intelligence.

[29]  D. Jude Hemanth,et al.  Brain signal based human emotion analysis by circular back propagation and Deep Kohonen Neural Networks , 2018, Comput. Electr. Eng..

[30]  Chien H. Thai,et al.  NURBS-based postbuckling analysis of functionally graded carbon nanotube-reinforced composite shells , 2019, Computer Methods in Applied Mechanics and Engineering.

[31]  Sung Wook Baik,et al.  A Cluster-Based Boosting Algorithm for Bankruptcy Prediction in a Highly Imbalanced Dataset , 2018, Symmetry.

[32]  Francisco Herrera,et al.  Learning from Imbalanced Data Sets , 2018, Springer International Publishing.

[33]  Francisco Herrera,et al.  A Review on Ensembles for the Class Imbalance Problem: Bagging-, Boosting-, and Hybrid-Based Approaches , 2012, IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews).

[34]  Sung Wook Baik,et al.  A Robust Framework for Self-Care Problem Identification for Children with Disability , 2019, Symmetry.

[35]  Gaël Varoquaux,et al.  Scikit-learn: Machine Learning in Python , 2011, J. Mach. Learn. Res..

[36]  Lyes Khoukhi,et al.  Decentralized Cloud-SDN Architecture in Smart Grid: A Dynamic Pricing Model , 2018, IEEE Transactions on Industrial Informatics.

[37]  Dae-Ki Kang,et al.  Geometric mean based boosting algorithm with over-sampling to resolve data imbalance problem for bankruptcy prediction , 2015, Expert Syst. Appl..

[38]  Tzung-Pei Hong,et al.  Efficient Algorithms for Mining Erasable Closed Patterns From Product Datasets , 2017, IEEE Access.

[39]  Junbao Zhang,et al.  A scheme for high level data classification using random walk and network measures , 2018, Expert Syst. Appl..

[40]  Chih-Fong Tsai,et al.  Clustering-based undersampling in class-imbalanced data , 2017, Inf. Sci..

[41]  Nitesh V. Chawla,et al.  SMOTE: Synthetic Minority Over-sampling Technique , 2002, J. Artif. Intell. Res..

[42]  Yi Lin,et al.  Support Vector Machines for Classification in Nonstandard Situations , 2002, Machine Learning.

[43]  Fernando Nogueira,et al.  Imbalanced-learn: A Python Toolbox to Tackle the Curse of Imbalanced Datasets in Machine Learning , 2016, J. Mach. Learn. Res..

[44]  Bart Baesens,et al.  An empirical comparison of techniques for the class imbalance problem in churn prediction , 2017, Inf. Sci..

[45]  Hung Nguyen-Xuan,et al.  A novel analysis-prediction approach for geometrically nonlinear problems using group method of data handling , 2019, Computer Methods in Applied Mechanics and Engineering.

[46]  Rajesh Parekh,et al.  An Engagement-Based Customer Lifetime Value System for E-commerce , 2016, KDD.

[47]  Benjamin Miranda Tabak,et al.  Bank lending and systemic risk: A financial-real sector network approach with feedback , 2017, Journal of Financial Stability.

[48]  Haibo He,et al.  ADASYN: Adaptive synthetic sampling approach for imbalanced learning , 2008, 2008 IEEE International Joint Conference on Neural Networks (IEEE World Congress on Computational Intelligence).

[49]  S. García,et al.  An Extension on "Statistical Comparisons of Classifiers over Multiple Data Sets" for all Pairwise Comparisons , 2008 .