Cluster Analysis for mixed data: An application to credit risk evaluation

Abstract Credit risk is one of the main risks faced by a bank to provide financial products and services to clients. To evaluate the financial performance of clients, several scoring methodologies have been proposed, which are based mostly on quantitative indicators. This paper highlights the relevance of both quantitative and qualitative features of applicants and proposes a new methodology based on mixed data clustering techniques. Indeed, cluster analysis may prove particularly useful in the estimation of credit risk. Traditionally, clustering concentrates only on quantitative or qualitative data at a time; however, since credit applicants are characterized by mixed personal features, a cluster analysis specific for mixed data can lead to discover particularly informative patterns, estimating the risk associated with credit granting.

[1]  Hong Jia,et al.  Categorical-and-numerical-attribute data clustering based on a unified similarity metric without knowing cluster number , 2013, Pattern Recognit..

[2]  L. Thomas A survey of credit and behavioural scoring: forecasting financial risk of lending to consumers , 2000 .

[3]  Allan D. Brunner Germany's three-pillar banking system : cross-country perspectives in Europe , 2004 .

[4]  Gautam Biswas,et al.  Unsupervised Learning with Mixed Numeric and Nominal Data , 2002, IEEE Trans. Knowl. Data Eng..

[5]  Lipika Dey,et al.  A k-mean clustering algorithm for mixed numeric and categorical data , 2007, Data Knowl. Eng..

[6]  T. Caliński,et al.  A dendrite method for cluster analysis , 1974 .

[7]  J. Gower A General Coefficient of Similarity and Some of Its Properties , 1971 .

[8]  Sankar K. Pal,et al.  Unsupervised feature selection using a neuro-fuzzy approach , 1998, Pattern Recognit. Lett..

[9]  Kate Smith-Miles,et al.  Clustering technique for risk classification and prediction of claim costs in the automobile insurance industry , 2001, Intell. Syst. Account. Finance Manag..

[10]  M. Punithavalli,et al.  A Review on Data Clustering Algorithms for Mixed Data , 2010 .

[11]  Antonio Balzanella,et al.  Cluster Analysis: An Application to a Real Mixed-Type Data Set , 2018, Models and Theories in Social Systems.

[12]  Witold Pedrycz,et al.  Use of a fuzzy granulation-degranulation criterion for assessing cluster validity , 2011, Fuzzy Sets Syst..

[13]  Stefano Antonio Gattone,et al.  Waste Management Analysis in Developing Countries through Unsupervised Classification of Mixed Data , 2019, Social Sciences.

[14]  S. Vidhya,et al.  A Comprehensive Review on Different Mixed Data Clustering Ensemble Methods , 2014 .

[15]  A. Saunders,et al.  Credit risk measurement: Developments over the last 20 years , 1997 .

[16]  E. Altman,et al.  Modelling Credit Risk for SMEs: Evidence from the U.S. Market , 2007 .

[17]  David L. Waltz,et al.  Toward memory-based reasoning , 1986, CACM.

[18]  Marianthi Markatou,et al.  A semiparametric method for clustering mixed data , 2016, Machine Learning.

[19]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[20]  Claude E. Shannon,et al.  The mathematical theory of communication , 1950 .

[21]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[22]  Tonio Di Battista,et al.  Cluster Analysis as a Decision-Making Tool: A Methodological Review , 2017, Decision Economics@DCAI.

[23]  Manabu Ichino,et al.  Generalized Minkowski metrics for mixed feature-type data analysis , 1994, IEEE Trans. Syst. Man Cybern..

[24]  Niall M. Adams,et al.  Identification of credit risk based on cluster analysis of account behaviours , 2017, J. Oper. Res. Soc..

[25]  D. S. Yeung,et al.  Improving Performance of Similarity-Based Clustering by Feature Weight Learning , 2002, IEEE Trans. Pattern Anal. Mach. Intell..

[26]  Yi Peng,et al.  Evaluation of clustering algorithms for financial risk analysis using MCDM methods , 2014, Inf. Sci..