Ridge‐penalized adaptive Mantel test and its application in imaging genetics

We propose a ridge-penalized adaptive Mantel test (AdaMant) for evaluating the association of two high-dimensional sets of features. By introducing a ridge penalty, AdaMant tests the association across many metrics simultaneously. We demonstrate how ridge penalization bridges Euclidean and Mahalanobis distances and their corresponding linear models from the perspective of association measurement and testing. This result is not only theoretically interesting but also has important implications in penalized hypothesis testing, especially in high-dimensional settings such as imaging genetics. Applying the proposed method to an imaging genetic study of visual working memory in healthy adults, we identified interesting associations of brain connectivity (measured by electroencephalogram coherence) with selected genetic features.

[1]  M. Beg,et al.  The Contribution Plot: Decomposition and Graphical Display of the RV Coefficient, with Application to Genetic and Brain Imaging Biomarkers of Alzheimer’s Disease , 2019, Human Heredity.

[2]  Arthur E. Hoerl,et al.  Ridge Regression: Biased Estimation for Nonorthogonal Problems , 2000, Technometrics.

[3]  Andrea Montanari,et al.  Surprises in High-Dimensional Ridgeless Least Squares Interpolation , 2019, Annals of statistics.

[4]  P. Pasqualetti,et al.  Sustainable method for Alzheimer dementia prediction in mild cognitive impairment: Electroencephalographic connectivity and graph theory combined with apolipoprotein E , 2018, Annals of neurology.

[5]  W Y Zhang,et al.  Discussion on `Sure independence screening for ultra-high dimensional feature space' by Fan, J and Lv, J. , 2008 .

[6]  J. Chang-Claude,et al.  Haplotype Sharing Analysis Using Mantel Statistics , 2005, Human Heredity.

[7]  M. Goddard,et al.  Prediction of total genetic value using genome-wide dense marker maps. , 2001, Genetics.

[8]  Myoungshic Jhun,et al.  RANDOM PERMUTATION TESTING IN MULTIPLE LINEAR REGRESSION , 2001 .

[9]  Maria L. Rizzo,et al.  Measuring and testing dependence by correlation of distances , 2007, 0803.4101.

[10]  D. Schaid,et al.  A review of kernel methods for genetic association studies , 2019, Genetic epidemiology.

[11]  Mario Bertero,et al.  The Stability of Inverse Problems , 1980 .

[12]  H. Ombao,et al.  Time‐Dependent Dual‐Frequency Coherence in Multivariate Non‐Stationary Time Series , 2018, Journal of Time Series Analysis.

[13]  Calyampudi R. Rao Large sample tests of statistical hypotheses concerning several parameters with applications to problems of estimation , 1948, Mathematical Proceedings of the Cambridge Philosophical Society.

[14]  John Shawe-Taylor,et al.  Canonical Correlation Analysis: An Overview with Application to Learning Methods , 2004, Neural Computation.

[15]  Arnab Maity,et al.  Multivariate Phenotype Association Analysis by Marker‐Set Kernel Machine Regression , 2012, Genetic epidemiology.

[16]  R. Fernando,et al.  Prediction of Complex Human Traits Using the Genomic Best Linear Unbiased Predictor , 2013, PLoS genetics.

[17]  J. Hodges Some algebra and geometry for hierarchical models, applied to diagnostics , 1998 .

[18]  J. Gower Some distance properties of latent root and vector methods used in multivariate analysis , 1966 .

[19]  J. Lisman,et al.  The θ-γ neural code. , 2013, Neuron.

[20]  Wei Pan,et al.  Adaptive testing for association between two random vectors in moderate to high dimensions , 2017, Genetic epidemiology.

[21]  Arthur E. Hoerl,et al.  Application of ridge analysis to regression problems , 1962 .

[22]  Donald A. Jackson,et al.  How well do multivariate data sets match? The advantages of a Procrustean superimposition approach over the Mantel test , 2001, Oecologia.

[23]  Amitash Ojha,et al.  Difference in brain activation patterns of individuals with high and low intelligence in linguistic and visuo-spatial tasks: An EEG study , 2017 .

[24]  Muni S. Srivastava,et al.  Regression Analysis: Theory, Methods, and Applications , 1991 .

[25]  Yen-Tsung Huang,et al.  Gene set analysis using variance component tests , 2013, BMC Bioinformatics.

[26]  Xihong Lin,et al.  Test for interactions between a genetic marker set and environment in generalized linear models. , 2013, Biostatistics.

[27]  Biyi Afonja,et al.  The Moments of the Maximum of Correlated Normal and T‐Variates , 1972 .

[28]  Jelle J. Goeman,et al.  Testing against a high-dimensional alternative in the generalized linear model: asymptotic type I error control , 2011 .

[29]  Arnaud Delorme,et al.  EEGLAB: an open source toolbox for analysis of single-trial EEG dynamics including independent component analysis , 2004, Journal of Neuroscience Methods.

[30]  Daniel J Schaid,et al.  Genomic Similarity and Kernel Methods I: Advancements by Building on Mathematical and Statistical Foundations , 2010, Human Heredity.

[31]  Mingyao Li,et al.  U‐Statistics‐based Tests for Multiple Genes in Genetic Association Studies , 2008, Annals of human genetics.

[32]  S. Makeig,et al.  Mining event-related brain dynamics , 2004, Trends in Cognitive Sciences.

[33]  S. R. Searle,et al.  On Deriving the Inverse of a Sum of Matrices , 1981 .

[34]  Tian Ge,et al.  Phenome-wide heritability analysis of the UK Biobank , 2016, bioRxiv.

[35]  Idris A. Eckley,et al.  Estimating Time-Evolving Partial Coherence Between Signals via Multivariate Locally Stationary Wavelet Processes , 2014, IEEE Transactions on Signal Processing.

[36]  Cornelis J. Stam,et al.  Declining functional connectivity and changing hub locations in Alzheimer’s disease: an EEG study , 2015, BMC Neurology.

[37]  N. Schork,et al.  Curve-based multivariate distance matrix regression analysis: application to genetic association analyses involving repeated measures. , 2010, Physiological genomics.

[38]  S. Holmes,et al.  Measuring multivariate association and beyond. , 2016, Statistics surveys.

[39]  Dmitry Kobak,et al.  The Optimal Ridge Penalty for Real-world High-dimensional Data Can Be Zero or Negative due to the Implicit Ridge Regularization , 2020, J. Mach. Learn. Res..

[40]  J. Lisman,et al.  The Theta-Gamma Neural Code , 2013, Neuron.

[41]  Jonathan P. Beauchamp,et al.  Genome-wide association study identifies 74 loci associated with educational attainment , 2016, Nature.

[42]  N. Mantel The detection of disease clustering and a generalized regression approach. , 1967, Cancer research.

[43]  Hernando Ombao,et al.  Evolutionary Coherence of Nonstationary Signals , 2008, IEEE Transactions on Signal Processing.

[44]  Babak Shahbaba,et al.  Evolutionary State-Space Model and Its Application to Time-Frequency Analysis of Local Field Potentials. , 2016, Statistica Sinica.

[45]  Anbupalam Thalamuthu,et al.  Association tests using kernel‐based measures of multi‐locus genotype similarity between individuals , 2009, Genetic epidemiology.

[46]  Samy Bengio,et al.  Understanding deep learning requires rethinking generalization , 2016, ICLR.

[47]  Stephen M. Smith,et al.  Permutation inference for the general linear model , 2014, NeuroImage.

[48]  H. Ombao,et al.  Coherence analysis of nonstationary time series: a linear filtering point of view , 2006 .

[49]  Manuel A. R. Ferreira,et al.  PLINK: a tool set for whole-genome association and population-based linkage analyses. , 2007, American journal of human genetics.

[50]  Erika Cule,et al.  Significance testing in ridge regression for genetic data , 2011, BMC Bioinformatics.

[51]  Stefan Wager,et al.  High-Dimensional Asymptotics of Prediction: Ridge Regression and Classification , 2015, 1507.03003.

[52]  J. Ware,et al.  Random-effects models for longitudinal data. , 1982, Biometrics.

[53]  Frank W. Stearns One Hundred Years of Pleiotropy: A Retrospective , 2010, Genetics.

[54]  N. Schork,et al.  Generalized genomic distance-based regression methodology for multilocus association analysis. , 2006, American journal of human genetics.

[55]  P. Robert,et al.  A Unifying Tool for Linear Multivariate Statistical Methods: The RV‐Coefficient , 1976 .

[56]  Xihong Lin,et al.  Semiparametric Regression of Multidimensional Genetic Pathway Data: Least‐Squares Kernel Machines and Linear Mixed Models , 2007, Biometrics.

[57]  Marek Omelka,et al.  A comparison of the Mantel test with a generalised distance covariance test , 2013 .

[58]  Paul Sauseng,et al.  EEG theta phase coupling during executive control of visual working memory investigated in individuals with schizophrenia and in healthy controls , 2014, Cognitive, affective & behavioral neuroscience.

[59]  Nicholas J. Schork,et al.  Statistical Properties of Multivariate Distance Matrix Regression for High-Dimensional Data Analysis , 2012, Front. Gene..

[60]  Jelle J. Goeman,et al.  A global test for groups of genes: testing association with a clinical outcome , 2004, Bioinform..

[61]  P. Visscher,et al.  GCTA: a tool for genome-wide complex trait analysis. , 2011, American journal of human genetics.

[62]  Hernando Ombao,et al.  Modeling the Evolution of Dynamic Brain Processes During an Associative Learning Experiment , 2016 .

[63]  Jin-Ting Zhang Approximate and Asymptotic Distributions of Chi-Squared–Type Mixtures With Applications , 2005 .

[64]  Tian Ge,et al.  Multidimensional heritability analysis of neuroanatomical shape , 2016, Nature Communications.

[65]  J. Hooper,et al.  Simultaneous Equations and Canonical Correlation Theory , 1959 .

[66]  R. Elston,et al.  The investigation of linkage between a quantitative trait and a marker locus , 1972, Behavior genetics.

[67]  Chee-Ming Ting,et al.  Statistical models for brain signals with properties that evolve across trials , 2017, NeuroImage.

[68]  R. Shumway,et al.  Time Series Regression and Exploratory Data Analysis , 2011 .

[69]  Daniel J Schaid,et al.  Nonparametric tests of association of multiple genes with human disease. , 2005, American journal of human genetics.

[70]  Seunggeun Lee,et al.  Test for rare variants by environment interactions in sequencing association studies , 2016, Biometrics.

[71]  Kaare Brandt Petersen,et al.  The Matrix Cookbook , 2006 .

[72]  Gregory A. Miller,et al.  Classification of functional brain images with a spatio-temporal dissimilarity map , 2006, NeuroImage.

[73]  Alex I. Wiesman,et al.  Beta Oscillatory Dynamics in the Prefrontal and Superior Temporal Cortices Predict Spatial Working Memory Performance , 2018, Scientific Reports.