Dissimilarity functions for rank-invariant hierarchical clustering of continuous variables

Abstract A theoretical framework is presented for a (copula-based) notion of dissimilarity between continuous random vectors and its main properties are studied. The proposed dissimilarity assigns the smallest value to a pair of random vectors that are comonotonic. Various properties of this dissimilarity are studied, with special attention to those that are prone to the hierarchical agglomerative methods, such as reducibility. Some insights are provided for the use of such a measure in clustering algorithms and a simulation study is presented. Real case studies illustrate the main features of the whole methodology.

[1]  Claudia Czado,et al.  Selecting and estimating regular vine copulae and application to financial returns , 2012, Comput. Stat. Data Anal..

[2]  Paola Zuccolotto,et al.  A double clustering algorithm for financial time series based on extreme events , 2016 .

[3]  Andrea Bonanomi,et al.  Dissimilarity measure for ranking data via mixture of copulae , 2019, Stat. Anal. Data Min..

[4]  R. Nelsen Concordance and Copulas: A Survey , 2002 .

[5]  A. Müller,et al.  Some Remarks on the Supermodular Order , 2000 .

[6]  Sebastian Fuchs,et al.  Characterizations of Copulas Attaining the Bounds of Multivariate Kendall’s Tau , 2018, J. Optim. Theory Appl..

[7]  A. D. Gordon A Review of Hierarchical Classification , 1987 .

[8]  Johan Segers,et al.  Measuring association and dependence between random vectors , 2014, J. Multivar. Anal..

[9]  Manuel Úbeda-Flores Multivariate versions of Blomqvist’s beta and Spearman’s footrule , 2005 .

[10]  Friedrich Schmid,et al.  Multivariate conditional versions of Spearman's rho and related measures of tail dependence , 2007 .

[11]  A. Hall Methods for showing Distinctness and aiding Identification of Critical Groups in Taxonomy and Ecology , 1968, Nature.

[12]  M. D. Taylor Multivariate measures of concordance for copulas and their marginals , 2010, 1004.5023.

[13]  Sartaj Sahni,et al.  Linear space string correction algorithm using the Damerau-Levenshtein distance , 2020, BMC Bioinform..

[14]  F. Marta L. Di Lascio,et al.  Clustering dependent observations with copula functions , 2015 .

[15]  Ruodu Wang,et al.  Extremal Dependence Concepts , 2015, 1512.03232.

[16]  Paul Embrechts,et al.  A note on generalized inverses , 2013, Math. Methods Oper. Res..

[17]  Christian Genest,et al.  A copula‐based risk aggregation model , 2015 .

[18]  Sebastian Fuchs,et al.  On Minimal Copulas under the Concordance Order , 2018, J. Optim. Theory Appl..

[19]  Andrea Cavalli,et al.  A Comparative Study on the Application of Hierarchical-Agglomerative Clustering Approaches to Organize Outputs of Reiterated Docking Runs , 2006, J. Chem. Inf. Model..

[20]  Inge Koch,et al.  Measuring Comonotonicity in M-Dimensional Vectors , 2011, ASTIN Bulletin.

[21]  Christian Genest,et al.  Copula parameter estimation using Blomqvist’s beta , 2013 .

[22]  A Biconvex Form for Copulas , 2016 .

[23]  Paola Zuccolotto,et al.  Dynamic tail dependence clustering of financial time series , 2017 .

[24]  B. Liseo,et al.  Portfolio Diversification Strategy Via Tail-Dependence Clustering and ARMA-GARCH Vine Copula Approach , 2018, Australian Economic Papers.

[25]  Giovanni De Luca,et al.  A tail dependence-based dissimilarity measure for financial time series clustering , 2011, Adv. Data Anal. Classif..

[26]  Guy Perrière,et al.  MADE4: an R package for multivariate analysis of gene expression data , 2005, Bioinform..

[27]  S. Fuchs Transformations of Copulas and Measures of Concordance , 2015 .

[28]  Marco Scarsini,et al.  On measures of concordance , 1984 .

[29]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Fabrizio Durante,et al.  Clustering of time series via non-parametric tail dependence estimation , 2015 .

[31]  Fabrizio Durante,et al.  Copula–based clustering methods , 2017 .

[32]  Irène Gijbels,et al.  On the specification of multivariate association measures and their behaviour with increasing dimension , 2021, J. Multivar. Anal..

[33]  Thierry Duchesne,et al.  Detection of block-exchangeable structure in large-scale correlation matrices , 2017, J. Multivar. Anal..

[34]  Fionn Murtagh,et al.  Handbook of Cluster Analysis , 2015 .

[35]  Friedrich Schmid,et al.  Copula-Based Measures of Multivariate Association , 2010 .

[36]  S. Fuchs Copula–Induced Measures of Concordance , 2016 .

[37]  Ivan Kojadinovic,et al.  Hierarchical clustering of continuous variables based on the empirical copula process and permutation linkages , 2010, Comput. Stat. Data Anal..

[38]  Dimitris Karlis,et al.  Model-based clustering using copulas with applications , 2014, Statistics and Computing.

[39]  Marco Scarsini,et al.  Multivariate comonotonicity , 2010, J. Multivar. Anal..

[40]  U. Alon,et al.  Transcriptional gene expression profiles of colorectal adenoma, adenocarcinoma, and normal tissue examined by oligonucleotide arrays. , 2001, Cancer research.

[41]  Nivedita Deo,et al.  Correlation and network analysis of global financial indices. , 2012, Physical review. E, Statistical, nonlinear, and soft matter physics.

[42]  E. Regazzini,et al.  On the centennial anniversary of Gini’s theory of statistical relations , 2017 .

[43]  G. Caldarelli,et al.  Networks of equities in financial markets , 2004 .

[44]  Ivan Kojadinovic,et al.  Agglomerative hierarchical clustering of continuous variables based on mutual information , 2004, Comput. Stat. Data Anal..

[45]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[46]  Guy Perrière,et al.  Cross-platform comparison and visualisation of gene expression data using co-inertia analysis , 2003, BMC Bioinformatics.

[47]  M. Scherer,et al.  Vandu el Distributions with given marginals : the beginnings An interview with , 2016 .

[48]  Pierpaolo D'Urso,et al.  Copula-based fuzzy clustering of spatial time series , 2017 .

[49]  Chen Yang,et al.  Clustering of financial instruments using jump tail dependence coefficient , 2018, Stat. Methods Appl..

[50]  Jan Dhaene,et al.  The Concept of Comonotonicity in Actuarial Science and Finance: Theory , 2002, Insurance: Mathematics and Economics.

[51]  Claudia Czado,et al.  Maximum likelihood estimation of mixed C-vines with application to exchange rates , 2012 .

[52]  H. Joe,et al.  Flexible copula models with dynamic dependence and application to financial data , 2020 .

[53]  L. Hubert,et al.  Comparing partitions , 1985 .

[54]  Elif F. Acar,et al.  Flexible dynamic vine copula models for multivariate time series data , 2019, Econometrics and Statistics.

[55]  C. Sempi,et al.  Principles of Copula Theory , 2015 .

[56]  Camille Roth,et al.  Natural Scales in Geographical Patterns , 2017, Scientific Reports.

[57]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[58]  C. Biernacki,et al.  Model-based clustering of Gaussian copulas for mixed data , 2014, 1405.1299.

[59]  M. Hofert,et al.  Kendall’s tau and agglomerative clustering for structure determination of hierarchical Archimedean copulas , 2017 .

[60]  William M. Rand,et al.  Objective Criteria for the Evaluation of Clustering Methods , 1971 .

[61]  C. Genest,et al.  ESTIMATORS BASED ON KENDALL'S TAU IN MULTIVARIATE COPULA MODELS , 2011 .

[62]  Andrew J. Patton A review of copula models for economic time series , 2012, J. Multivar. Anal..

[63]  Fabrizio Durante,et al.  Clustering of financial time series in risky scenarios , 2013, Advances in Data Analysis and Classification.

[64]  Harry Joe,et al.  Multivariate concordance , 1990 .

[65]  J. V. Ness,et al.  Admissible clustering procedures , 1971 .

[66]  Brian Everitt,et al.  Cluster analysis , 1974 .