On Clustering Financial Time Series: A Need for Distances Between Dependent Random Variables

The following working document summarizes our work on the clustering of financial time series. It was written for a workshop on information geometry and its application for image and signal processing. This workshop brought several experts in pure and applied mathematics together with applied researchers from medical imaging, radar signal processing and finance. The authors belong to the latter group. This document was written as a long introduction to further development of geometric tools in financial applications such as risk or portfolio analysis. Indeed, risk and portfolio analysis essentially rely on covariance matrices. Besides that the Gaussian assumption is known to be inaccurate, covariance matrices are difficult to estimate from empirical data. To filter noise from the empirical estimate, Mantegna proposed using hierarchical clustering. In this work, we first show that this procedure is statistically consistent. Then, we propose to use clustering with a much broader application than the filtering of empirical covariance matrices from the estimate correlation coefficients. To be able to do that, we need to obtain distances between the financial time series that incorporate all the available information in these cross-dependent random processes.

[1]  R. Nelsen,et al.  On the relationship between Spearman's rho and Kendall's tau for pairs of continuous random variables , 2007 .

[2]  Frank Nielsen,et al.  A Proposal of a Methodological Framework with Experimental Guidelines to Investigate Clustering Stability on Financial Time Series , 2015, 2015 IEEE 14th International Conference on Machine Learning and Applications (ICMLA).

[3]  Le Song,et al.  A Hilbert Space Embedding for Distributions , 2007, Discovery Science.

[4]  Asuka Takatsu Wasserstein geometry of Gaussian measures , 2011 .

[5]  B. L. William Wong,et al.  Clustering Techniques And their Effect on Portfolio Formation and Risk Analysis , 2014, DSMM'14.

[6]  J. V. Ness,et al.  Space-conserving agglomerative algorithms , 1996 .

[7]  Fionn Murtagh,et al.  Methods of Hierarchical Clustering , 2011, ArXiv.

[8]  Sueli I. Rodrigues Costa,et al.  Fisher information distance: a geometrical reading? , 2012, Discret. Appl. Math..

[9]  Joachim M. Buhmann,et al.  Stability-Based Validation of Clustering Solutions , 2004, Neural Computation.

[10]  Deborah F. Swayne,et al.  Grouping Multivariate Time Series : A Case Study , 2006 .

[11]  Gautier Marti,et al.  Toward a generic representation of random variables for machine learning , 2015, Pattern Recognit. Lett..

[12]  R. Mantegna Hierarchical structure in financial markets , 1998, cond-mat/9802256.

[13]  Cyrus Shahabi,et al.  A PCA-based similarity measure for multivariate time series , 2004, MMDB '04.

[14]  Frank Nielsen,et al.  Optimal copula transport for clustering multivariate time series , 2016, 2016 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP).

[15]  Mikhail Belkin,et al.  Consistency of spectral clustering , 2008, 0804.0678.

[16]  Frank Nielsen,et al.  Clustering Financial Time Series: How Long Is Enough? , 2016, IJCAI.

[17]  M. Sklar Fonctions de repartition a n dimensions et leurs marges , 1959 .

[18]  Roberto Bellotti,et al.  Hausdorff Clustering of Financial Time Series , 2007 .

[19]  Azadeh Khaleghi,et al.  Consistent Algorithms for Clustering Time Series , 2016, J. Mach. Learn. Res..

[20]  Moshe Shaked,et al.  Linkages: A Tool for the Construction of Multivariate Distributions with Given Nonoverlapping Multivariate Marginals , 1996 .

[21]  Paul Deheuvels,et al.  An asymptotic decomposition for multivariate distribution-free tests of independence , 1981 .

[22]  Fabrizio Lillo,et al.  Shrinkage and spectral filtering of correlation matrices: a comparison via the Kullback-Leibler distance , 2007, 0710.0576.

[23]  Christian Genest,et al.  De l'impossibilité de construire des lois à marges multidimensionnelles données à partir de copules , 1995 .

[24]  Jean-Philippe Bouchaud,et al.  Financial Applications of Random Matrix Theory: Old Laces and New Pieces , 2005 .

[25]  D. Pollard Strong Consistency of $K$-Means Clustering , 1981 .

[26]  J. Bouchaud,et al.  RANDOM MATRIX THEORY AND FINANCIAL CORRELATIONS , 2000 .

[27]  J. Bouchaud,et al.  The eigenvectors of Gaussian matrices with an external source , 2014, 1412.7108.

[28]  D. Seborg,et al.  Clustering multivariate time‐series data , 2005 .

[29]  Fabrizio Lillo,et al.  Cluster analysis for portfolio optimization , 2005, physics/0507006.

[30]  V. Plerou,et al.  Random matrix approach to cross correlations in financial data. , 2001, Physical review. E, Statistical, nonlinear, and soft matter physics.

[31]  C. Atkinson Rao's distance measure , 1981 .

[32]  J. S. Marron,et al.  Asymptotics of hierarchical clustering for growing dimension , 2014, J. Multivar. Anal..

[33]  Yoshikazu Terada,et al.  Strong Consistency of Reduced K‐means Clustering , 2012, 1212.4942.

[34]  J. Hartigan Consistency of Single Linkage for High-Density Clusters , 1981 .

[35]  J. Bouchaud,et al.  Noise Dressing of Financial Correlation Matrices , 1998, cond-mat/9810255.

[36]  Sivaraman Balakrishnan,et al.  Efficient Active Algorithms for Hierarchical Clustering , 2012, ICML.

[37]  Tso-Jung Yen,et al.  Discussion on "Stability Selection" by Meinshausen and Buhlmann , 2010 .

[38]  R. Mantegna,et al.  An Introduction to Econophysics: Contents , 1999 .

[39]  Rosario N. Mantegna,et al.  Book Review: An Introduction to Econophysics, Correlations, and Complexity in Finance, N. Rosario, H. Mantegna, and H. E. Stanley, Cambridge University Press, Cambridge, 2000. , 2000 .

[40]  R. Cont Empirical properties of asset returns: stylized facts and statistical issues , 2001 .

[41]  Sivaraman Balakrishnan,et al.  Noise Thresholds for Spectral Clustering , 2011, NIPS.

[42]  Claudia Czado,et al.  Model distances for vine copulas in high dimensions , 2015, Stat. Comput..

[43]  Yoshikazu Terada Strong consistency of factorial $$K$$K-means clustering , 2013 .

[44]  Jean-Philippe Bouchaud,et al.  Rotational Invariant Estimator for General Noisy Matrices , 2015, IEEE Transactions on Information Theory.

[45]  M. Tumminello,et al.  When do Improved Covariance Matrix Estimators Enhance Portfolio Optimization? An Empirical Comparative Study of Nine Estimators , 2010, 1004.4272.

[46]  Sio Iong Ao,et al.  CLUSTAG: hierarchical clustering and graph methods for selecting tag SNPs , 2005, Bioinform..

[47]  Robert Tibshirani,et al.  Hierarchical Clustering With Prototypes via Minimax Linkage , 2011, Journal of the American Statistical Association.

[48]  Daniil Ryabko Clustering processes , 2010, ICML.

[49]  N. Meinshausen,et al.  Stability selection , 2008, 0809.2932.

[50]  Driss Aboutajdine,et al.  Color Texture Classification Using Rao Distance between Multivariate Copula Based Models , 2011, CAIP.

[51]  Bernhard Schölkopf,et al.  Kernel Choice and Classifiability for RKHS Embeddings of Probability Distributions , 2009, NIPS.

[52]  Azadeh Khaleghi,et al.  Online Clustering of Processes , 2012, AISTATS.

[53]  Ohad Shamir,et al.  Cluster Stability for Finite Samples , 2007, NIPS.