mgcpy: A Comprehensive High Dimensional Independence Testing Python Package

With the increase in the amount of data in many fields, a method to consistently and efficiently decipher relationships within high dimensional data sets is important. Because many modern datasets are high-dimensional, univariate independence tests are not applicable. While many multivariate independence tests have R packages available, the interfaces are inconsistent, most are not available in Python. mgcpy is an extensive Python library that includes many state of the art high-dimensional independence testing procedures using a common interface. The package is easy-to-use and is flexible enough to enable future extensions. This manuscript provides details for each of the tests as well as extensive power and run-time benchmarks on a suite of high-dimensional simulations previously used in different publications. The appendix includes demonstrations of how the user can interact with the package, as well as links and documentation.

[1]  C S Bergeman,et al.  Extending multivariate distance matrix regression with an effect size measure and the asymptotic null distribution of the test statistic , 2017, Psychometrika.

[2]  C. Priebe,et al.  Network dependence testing via diffusion maps and distance-based correlations , 2017, Biometrika.

[3]  Le Song,et al.  A Kernel Statistical Test of Independence , 2007, NIPS.

[4]  A. Rényi On measures of dependence , 1959 .

[5]  Eric W. Bridgeford,et al.  Discovering and deciphering relationships across disparate data modalities , 2016, eLife.

[6]  M. Kendall A NEW MEASURE OF RANK CORRELATION , 1938 .

[7]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[8]  Carey E. Priebe,et al.  Generalized canonical correlation analysis for classification , 2013, J. Multivar. Anal..

[9]  K. Pearson VII. Note on regression and inheritance in the case of two parents , 1895, Proceedings of the Royal Society of London.

[10]  Arthur Gretton,et al.  Consistent Nonparametric Tests of Independence , 2010, J. Mach. Learn. Res..

[11]  Brian H. McArdle,et al.  FITTING MULTIVARIATE MODELS TO COMMUNITY DATA: A COMMENT ON DISTANCE‐BASED REDUNDANCY ANALYSIS , 2001 .

[12]  Kenji Fukumizu,et al.  Equivalence of distance-based and RKHS-based statistics in hypothesis testing , 2012, ArXiv.

[13]  Gábor J. Székely,et al.  The distance correlation t-test of independence in high dimension , 2013, J. Multivar. Anal..

[14]  J. E. García,et al.  A non-parametric test of independence ∗ , 2011 .

[15]  Cencheng Shen,et al.  The Exact Equivalence of Distance and Kernel Methods for Hypothesis Testing , 2018, ArXiv.

[16]  P. McCullagh Analysis of Ordinal Categorical Data , 1985 .

[17]  Maria L. Rizzo,et al.  Brownian distance covariance , 2009, 1010.0297.

[18]  Kurt Hornik,et al.  kernlab - An S4 Package for Kernel Methods in R , 2004 .

[19]  Y. Escoufier LE TRAITEMENT DES VARIABLES VECTORIELLES , 1973 .

[20]  Carey E. Priebe,et al.  From Distance Correlation to Multiscale Graph Correlation , 2017, Journal of the American Statistical Association.

[21]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[22]  P. Robert,et al.  A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient , 1976 .

[23]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[24]  Maria L. Rizzo,et al.  Partial Distance Correlation with Methods for Dissimilarities , 2013, 1310.2926.

[25]  M. Schilling Multivariate Two-Sample Tests Based on Nearest Neighbors , 1986 .

[26]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[27]  Ann. Probab Distance Covariance in Metric Spaces , 2017 .

[28]  C. Spearman The proof and measurement of association between two things. , 2015, International journal of epidemiology.

[29]  Eric Jones,et al.  SciPy: Open Source Scientific Tools for Python , 2001 .

[30]  J. Friedman,et al.  Graph-Theoretic Measures of Multivariate Association and Prediction , 1983 .

[31]  R. Heller,et al.  A consistent multivariate test of association based on ranks of distances , 2012, 1201.3522.