Clustering Techniques And their Effect on Portfolio Formation and Risk Analysis

This paper explores the application of three different portfolio formation rules using standard clustering techniques---K-means, K-mediods, and hierarchical---to a large financial data set (16 years of daily CRSP stock data) to determine how the choice of clustering technique may affect analysts' perceptions of the riskiness of different portfolios in the context of a prototype visual analytics system designed for financial stability monitoring. We use a two-phased experimental approach with visualizations to explore the effects of the different clustering techniques. The choice of clustering technique matters. There is significant variation among techniques, resulting in different "pictures" of the riskiness of the same underlying data when plotted to the visual analytics tool. This sensitivity to clustering methodolgy has the potential to mislead analysts about the riskiness of portfolios. We conclude that further research into the implications of portfolio formation rules is needed, and that visual analytics tools should not limit analysts to a single clustering technique, but instead should provide the facility to explore the data using different techniques.

[1]  Shigeo Abe DrEng Pattern Classification , 2001, Springer London.

[2]  Defu Zhang,et al.  A Decision Tree Scoring Model Based on Genetic Algorithm and K-Means Algorithm , 2008, 2008 Third International Conference on Convergence and Hybrid Information Technology.

[3]  Nan-Chen Hsieh,et al.  Hybrid mining approach in the design of credit scoring models , 2005, Expert Syst. Appl..

[4]  H. Charles Romesburg,et al.  Cluster analysis for researchers , 1984 .

[5]  Sébastien Page,et al.  Principal Components as a Measure of Systemic Risk , 2010, The Journal of Portfolio Management.

[6]  Daniel P. Fasulo,et al.  An Analysis of Recent Work on Clustering Algorithms , 1999 .

[7]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[8]  Ana L. N. Fred,et al.  Finding Consistent Clusters in Data Partitions , 2001, Multiple Classifier Systems.

[9]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[10]  A. Lo,et al.  A Survey of Systemic Risk Analytics , 2012 .

[11]  P. Sopp Cluster analysis. , 1996, Veterinary immunology and immunopathology.

[12]  Ana L. N. Fred,et al.  Combining multiple clusterings using evidence accumulation , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[13]  David G. Stork,et al.  Pattern Classification , 1973 .

[14]  Anna A. Obizhaeva,et al.  Market Microstructure Invariants: Theory and Implications of Calibration , 2011 .

[15]  Ömer Kaan Baykan,et al.  Predicting bank financial failures using neural networks, support vector machines and multivariate statistical methods: A comparative analysis in the sample of savings deposit insurance fund (SDIF) transferred banks in Turkey , 2009, Expert Syst. Appl..

[16]  Albert S. Kyle,et al.  Market Microstructure Invariants: Empirical Evidence from Portfolio Transitions , 2011 .

[17]  Victoria L. Lemieux BRIEFING NOTE FOR THE DESIGN CONCEPT: CONTRACTS AND THE "RISK MAP" , 2013 .

[18]  P. Manimaran,et al.  Modelling Financial Time Series , 2006 .

[19]  Jens Rasmussen,et al.  Cognitive Systems Engineering , 2022 .

[20]  Sergios Theodoridis,et al.  Pattern Recognition, Third Edition , 2006 .

[21]  Eric J. Pauwels,et al.  Finding regions of interest for content extraction , 1998, Electronic Imaging.

[22]  P. Rousseeuw Silhouettes: a graphical aid to the interpretation and validation of cluster analysis , 1987 .

[23]  John L. Maginn Managing Investment Portfolios: A Dynamic Process , 2008 .

[24]  Kristin A. Cook,et al.  Illuminating the Path: The Research and Development Agenda for Visual Analytics , 2005 .

[25]  Javier M. Moguerza,et al.  Detecting the Number of Clusters Using a Support Vector Machine Approach , 2002, ICANN.

[26]  Francisco Javier de Cos Juez,et al.  Bankruptcy forecasting: A hybrid approach using Fuzzy c-means clustering and Multivariate Adaptive Regression Splines (MARS) , 2011, Expert Syst. Appl..

[27]  Peter J. Rousseeuw,et al.  Clustering by means of medoids , 1987 .

[28]  Brian Everitt,et al.  Cluster analysis , 1974 .

[29]  Claudio Carpineto,et al.  A lattice conceptual clustering system and its application to browsing retrieval , 2004, Machine Learning.

[30]  David E. Booth,et al.  The use of fuzzy clustering algorithm and self-organizing neural networks for identifying potentially failing banks: an experimental study , 2000 .

[31]  Ali S. Hadi,et al.  Finding Groups in Data: An Introduction to Chster Analysis , 1991 .

[32]  Helwig Hauser,et al.  Parallel Sets: interactive exploration and visual analysis of categorical data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[33]  B. Ripley,et al.  Pattern Recognition , 1968, Nature.

[34]  Anil K. Jain,et al.  Large-scale parallel data clustering , 1996, Proceedings of 13th International Conference on Pattern Recognition.

[35]  Vadlamani Ravi,et al.  Bankruptcy prediction in banks and firms via statistical and intelligent techniques - A review , 2007, Eur. J. Oper. Res..

[36]  Anil K. Jain,et al.  Data clustering: a review , 1999, CSUR.

[37]  Jitender S. Deogun,et al.  Conceptual clustering in information retrieval , 1998, IEEE Trans. Syst. Man Cybern. Part B.

[38]  Rui Xu,et al.  Survey of clustering algorithms , 2005, IEEE Transactions on Neural Networks.