Measures of dependence between random vectors and tests of independence. Literature review

Simple correlation coefficients between two variables have been generalized to measure association between two matrices in many ways. Coefficients such as the RV coefficient, the distance covariance (dCov) coefficient and kernel based coefficients have been adopted by different research communities. Scientists use these coefficients to test whether two random vectors are linked. If they are, it is important to uncover what patterns exist in these associations. We discuss the topic of measures of dependence between random vectors and tests of independence and show links between different approaches. We document some of the interesting rediscoveries and lack of interconnection between bodies of literature. After providing definitions of the coefficients and associated tests, we present the recent improvements that enhance their statistical properties and ease of interpretation. We summarize multi-table approaches and provide scenarii where the indices can provide useful summaries of heterogeneous multi-block data. We illustrate these different strategies on several examples of real data and suggest directions for future research.

[1]  I. J. Schoenberg Remarks to Maurice Frechet's Article ``Sur La Definition Axiomatique D'Une Classe D'Espace Distances Vectoriellement Applicable Sur L'Espace De Hilbert , 1935 .

[2]  A. Rényi On measures of dependence , 1959 .

[3]  R. Sokal,et al.  Principles of numerical taxonomy , 1965 .

[4]  Calyampudi R. Rao The use and interpretation of principal component analysis in applied research , 1964 .

[5]  E G Knox,et al.  The Detection of Space‐Time Interactions , 1964 .

[6]  Robert R. Sokal,et al.  Numerical Taxonomy@@@Principles of Numerical Taxonomy. , 1964 .

[7]  D. E. Barton,et al.  Two Space-time Interaction Tests for Epidemicity * , 1966 .

[8]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[9]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[10]  John C. Gower,et al.  Statistical methods of comparing different multivariate analyses of the same data , 1971 .

[11]  Peter H. A. Sneath,et al.  Numerical Taxonomy: The Principles and Practice of Numerical Classification , 1973 .

[12]  Y. Escoufier LE TRAITEMENT DES VARIABLES VECTORIELLES , 1973 .

[13]  P. Robert,et al.  A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient , 1976 .

[14]  A. Olshan,et al.  Robust and least-squares orthogonal mapping: methods for the study of cephalofacial form and growth. , 1982, American journal of physical anthropology.

[15]  J. Friedman,et al.  Graph-Theoretic Measures of Multivariate Association and Prediction , 1983 .

[16]  Robert Cléroux,et al.  Some results on vector correlation , 1985 .

[17]  R. Sokal,et al.  Multiple regression and correlation extensions of the mantel test of matrix correspondence , 1986 .

[18]  M. Greenacre Correspondence analysis of multivariate categorical data by weighted least-squares , 1988 .

[19]  Joseph P. Romano A Bootstrap Revival of Some Nonparametric Distance Tests , 1988 .

[20]  A. Lazraq,et al.  Étude comparative de différentes mesures de liaison entre deux vecteurs aléatoires et tests d'indépendance , 1988 .

[21]  G. Ducharme,et al.  Vector correlation for elliptical distributions , 1989 .

[22]  Joseph P. Romano Bootstrap and randomization tests of some nonparametric hypotheses , 1989 .

[23]  F. Rohlf,et al.  Extensions of the Procrustes Method for the Optimal Superimposition of Landmarks , 1990 .

[24]  A. Lazraq,et al.  Mesures de liaison vectorielle et généralisation de l'analyse canonique , 1992 .

[25]  Michel Génard,et al.  Correlations among quality parameters of peach fruit , 1994 .

[26]  Robert Sabatier,et al.  The ACT (STATIS method) , 1994 .

[27]  Jérôme Pagès,et al.  Multiple factor analysis (AFMULT package) , 1994 .

[28]  Donald A. Jackson PROTEST: A PROcrustean Randomization TEST of community environment concordance , 1995 .

[29]  R. Sabatier,et al.  Refined approximations to permutation tests for multivariate inference , 1995 .

[30]  Pascal Schlich,et al.  Defining and Validating Assessor Compromises About Product Distances and Attribute Correlations , 1996 .

[31]  Einar Risvik,et al.  Evaluation of sensory profiling and projective mapping data , 1997 .

[32]  N. Cristianini,et al.  On Kernel-Target Alignment , 2001, NIPS.

[33]  Susan E. Ebeler,et al.  Use of multivariate statistics in understanding wine flavor , 2002 .

[34]  Jean Thioulouse,et al.  Procrustean co-inertia analysis for the linking of multivariate datasets , 2003 .

[35]  Guy Perrière,et al.  Cross-platform comparison and visualisation of gene expression data using co-inertia analysis , 2003, BMC Bioinformatics.

[36]  Jérôme Pagès,et al.  Multiple factor analysis with confidence ellipses: a methodology to study the relationships between sensory and instrumental data , 2005 .

[37]  Bernhard Schölkopf,et al.  Kernel Methods for Measuring Independence , 2005, J. Mach. Learn. Res..

[38]  Yves Escoufier,et al.  Operator related to a data matrix: a survey , 2006 .

[39]  M. Greenacre,et al.  Multiple Correspondence Analysis and Related Methods , 2006 .

[40]  H. Abdi The RV coefficient and the congruence coefficient , 2007 .

[41]  Anne-Béatrice Dufour,et al.  The ade4 Package: Implementing the Duality Diagram for Ecologists , 2007 .

[42]  Sarah C. Goslee,et al.  The ecodist Package for Dissimilarity-based Analysis of Ecological Data , 2007 .

[43]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[44]  S. Lê,et al.  BMC Genomics BioMed Central Methodology article Simultaneous analysis of distinct Omics data sets with integration of biological knowledge: Multiple Factor Analysis approach , 2008 .

[45]  Tom Michael Mitchell,et al.  From the SelectedWorks of Marcel Adam Just 2008 Using fMRI brain activation to identify cognitive states associated with perception of tools and dwellings , 2016 .

[46]  Sébastien Lê,et al.  FactoMineR: An R Package for Multivariate Analysis , 2008 .

[47]  Age K. Smilde,et al.  Real-life metabolomics data analysis : how to deal with complex data ? , 2010 .

[48]  Christian Peter Klingenberg,et al.  Morphometric integration and modularity in configurations of landmarks: tools for evaluating a priori hypotheses , 2009, Evolution & development.

[49]  Bülent Yener,et al.  Unsupervised Multiway Data Analysis: A Literature Survey , 2009, IEEE Transactions on Knowledge and Data Engineering.

[50]  R. Tibshirani,et al.  A penalized matrix decomposition, with applications to sparse principal components and canonical correlation analysis. , 2009, Biostatistics.

[51]  Mark Holmes,et al.  Tests of independence among continuous random vectors based on Cramér-von Mises functionals of the empirical copula process , 2009, J. Multivar. Anal..

[52]  Jean-François Quessy,et al.  Applications and asymptotic power of marginal-free tests of stochastic vectorial independence , 2010 .

[53]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[54]  M. Fortin,et al.  Comparison of the Mantel test and alternative approaches for detecting complex multivariate relationships in the spatial analysis of genetic data , 2010, Molecular ecology resources.

[55]  Graham W. Horgan,et al.  Exploratory Analysis of Multiple Omics Datasets Using the Adjusted RV Coefficient , 2011, Statistical applications in genetics and molecular biology.

[56]  Gilles Guillot,et al.  Dismantling the Mantel tests , 2011, 1112.0651.

[57]  R. Heller,et al.  A class of multivariate distribution-free tests of independence based on graphs. , 2012, Journal of statistical planning and inference.

[58]  Le Song,et al.  Feature Selection via Dependence Maximization , 2012, J. Mach. Learn. Res..

[59]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[60]  Giovanni Montana,et al.  A distance-based test of association between paired heterogeneous genomic data , 2013, Bioinform..

[61]  Bernhard Schölkopf,et al.  The Randomized Dependence Coefficient , 2013, NIPS.

[62]  Marek Omelka,et al.  A comparison of the Mantel test with a generalised distance covariance test , 2013 .

[63]  Maria L. Rizzo,et al.  Energy statistics: A class of statistics based on distances , 2013 .

[64]  A. Meyer,et al.  Resampling-Based Approaches to Study Variation in Morphological Modularity , 2013, PloS one.

[65]  S. Santana,et al.  Does nasal echolocation influence the modularity of the mammal skull? , 2013, Journal of evolutionary biology.

[66]  Maria L. Rizzo,et al.  Partial Distance Correlation with Methods for Dissimilarities , 2013, 1310.2926.

[67]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[68]  Matthew Reimherr,et al.  On Quantifying Dependence: A Framework for Developing Interpretable Measures , 2013, 1302.5233.

[69]  R. Lyons Distance covariance in metric spaces , 2011, 1106.5758.

[70]  Paula Bona,et al.  Intraspecific variation in the skull morphology of the black caiman Melanosuchus niger (Alligatoridae, Caimaninae) , 2015 .

[71]  Donald A. Jackson,et al.  How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test , 2001, Oecologia.