Unsupervised quadratic surface support vector machine with application to credit risk assessment

Abstract Unsupervised classification is a highly important task of machine learning methods. Although achieving great success in supervised classification, support vector machine (SVM) is much less utilized to classify unlabeled data points, which also induces many drawbacks including sensitive to nonlinear kernels and random initializations, high computational cost, unsuitable for imbalanced datasets. In this paper, to utilize the advantages of SVM and overcome the drawbacks of SVM-based clustering methods, we propose a completely new two-stage unsupervised classification method with no initialization: a new unsupervised kernel-free quadratic surface SVM (QSSVM) model is proposed to avoid selecting kernels and related kernel parameters, then a golden-section algorithm is designed to generate the appropriate classifier for balanced and imbalanced data. By studying certain properties of proposed model, a convergent decomposition algorithm is developed to implement this non-covex QSSVM model effectively and efficiently (in terms of computational cost). Numerical tests on artificial and public benchmark data indicate that the proposed unsupervised QSSVM method outperforms well-known clustering methods (including SVM-based and other state-of-the-art methods), particularly in terms of classification accuracy. Moreover, we extend and apply the proposed method to credit risk assessment by incorporating the T-test based feature weights. The promising numerical results on benchmark personal credit data and real-world corporate credit data strongly demonstrate the effectiveness, efficiency and interpretability of proposed method, as well as indicate its significant potential in certain real-world applications.

[1]  Joe Naoum-Sawaya,et al.  High dimensional data classification and feature selection using support vector machines , 2018, Eur. J. Oper. Res..

[2]  Ning Chen,et al.  Improve credit scoring using transfer of learned knowledge from self-organizing map , 2016, Neural Computing and Applications.

[3]  Francesco Camastra,et al.  A Novel Kernel Method for Clustering , 2005, IEEE Trans. Pattern Anal. Mach. Intell..

[4]  Kin Keung Lai,et al.  Credit Scoring Models with AUC Maximization Based on Weighted SVM , 2009, Int. J. Inf. Technol. Decis. Mak..

[5]  Mário A. T. Figueiredo,et al.  Soft clustering using weighted one-class support vector machines , 2009, Pattern Recognit..

[6]  Le Thi Hoai An,et al.  DC programming and DCA: thirty years of developments , 2018, Math. Program..

[7]  Siegfried Schaible,et al.  Fractional programming: The sum-of-ratios case , 2003, Optim. Methods Softw..

[8]  Emilio Carrizosa,et al.  Functional-bandwidth kernel for Support Vector Machine with Functional Data: An alternating optimization algorithm , 2019, Eur. J. Oper. Res..

[9]  Glenn Fung,et al.  Unsupervised and Semisupervised Classification Via Absolute Value Inequalities , 2016, J. Optim. Theory Appl..

[10]  Frank J. Fabozzi,et al.  Improving corporate bond recovery rate prediction using multi-factor support vector regressions , 2018, Eur. J. Oper. Res..

[11]  Annabella Astorino,et al.  Semisupervised spherical separation , 2015 .

[12]  Jian Luo,et al.  Benchmarking robustness of load forecasting models under data integrity attacks , 2018 .

[13]  J. Suykens,et al.  Benchmarking state-of-the-art classification algorithms for credit scoring: An update of research , 2015, Eur. J. Oper. Res..

[14]  D.M. Mount,et al.  An Efficient k-Means Clustering Algorithm: Analysis and Implementation , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[15]  Minghe Sun,et al.  A multi-kernel support tensor machine for classification with multitype multiway data and an application to cross-selling recommendations , 2016, Eur. J. Oper. Res..

[16]  Hedieh Sajedi,et al.  A hybrid data mining model of feature selection algorithms and ensemble learning classifiers for credit scoring , 2015 .

[17]  Mangui Liang,et al.  Fuzzy support vector machine based on within-class scatter for classification problems with outliers or noises , 2013, Neurocomputing.

[18]  Ye Tian,et al.  Clustering via fuzzy one-class quadratic surface support vector machine , 2016, Soft Computing.

[19]  Michael I. Jordan,et al.  On Spectral Clustering: Analysis and an algorithm , 2001, NIPS.

[20]  Sebastián Maldonado,et al.  Cost-based feature selection for Support Vector Machines: An application in credit scoring , 2017, Eur. J. Oper. Res..

[21]  Teuvo Kohonen,et al.  Self-organized formation of topologically correct feature maps , 2004, Biological Cybernetics.

[22]  Shu-Cherng Fang,et al.  A kernel-free quadratic surface support vector machine for semi-supervised learning , 2016, J. Oper. Res. Soc..

[23]  Yen-Ching Chang,et al.  N-Dimension Golden Section Search: Its Variants and Limitations , 2009, 2009 2nd International Conference on Biomedical Engineering and Informatics.

[24]  T. Kohonen Self-organized formation of topographically correct feature maps , 1982 .

[25]  Sang Won Yoon,et al.  A support vector machine-based ensemble algorithm for breast cancer diagnosis , 2017, Eur. J. Oper. Res..

[26]  Thomas Martinetz,et al.  'Neural-gas' network for vector quantization and its application to time-series prediction , 1993, IEEE Trans. Neural Networks.

[27]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.