Large-Scale Multiple Testing of Correlations

Multiple testing of correlations arises in many applications including gene coexpression network analysis and brain connectivity analysis. In this article, we consider large-scale simultaneous testing for correlations in both the one-sample and two-sample settings. New multiple testing procedures are proposed and a bootstrap method is introduced for estimating the proportion of the nulls falsely rejected among all the true nulls. We investigate the properties of the proposed procedures both theoretically and numerically. It is shown that the procedures asymptotically control the overall false discovery rate and false discovery proportion at the nominal level. Simulation results show that the methods perform well numerically in terms of both the size and power of the test and it significantly outperforms two alternative methods. The two-sample procedure is also illustrated by an analysis of a prostate cancer dataset for the detection of changes in coexpression patterns between gene expression levels. Supplementary materials for this article are available online.

[1]  Y. Benjamini,et al.  Controlling the false discovery rate: a practical and powerful approach to multiple testing , 1995 .

[2]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[3]  Liang Chen,et al.  A statistical method for identifying differential gene-gene co-expression patterns , 2004, Bioinform..

[4]  Katsuto Tanaka The Stochastic Process Approach , 2017 .

[5]  John D. Storey,et al.  Strong control, conservative point estimation and simultaneous conservative consistency of false discovery rates: a unified approach , 2004 .

[6]  A. Owen Variance of the number of false discoveries , 2005 .

[7]  Alan C. Evans,et al.  Intellectual ability and cortical development in children and adolescents , 2006, Nature.

[8]  Y. Benjamini,et al.  THE CONTROL OF THE FALSE DISCOVERY RATE IN MULTIPLE TESTING UNDER DEPENDENCY , 2001 .

[9]  Michael Griffin,et al.  Gene co-expression network topology provides a framework for molecular characterization of cellular state , 2004, Bioinform..

[10]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[11]  A. Farcomeni Some Results on the Control of the False Discovery Rate under Dependence , 2007 .

[12]  John D. Storey A direct approach to false discovery rates , 2002 .

[13]  P. Deloukas,et al.  Multiple common variants for celiac disease influencing immune gene expression , 2010, Nature Genetics.

[14]  W. Wu,et al.  On false discovery control under dependence , 2008, 0803.1971.

[15]  L. Wasserman,et al.  A stochastic process approach to false discovery control , 2004, math/0406519.

[16]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[17]  M. Hirai,et al.  Omics-based identification of Arabidopsis Myb transcription factors regulating aliphatic glucosinolate biosynthesis , 2007, Proceedings of the National Academy of Sciences.

[18]  Rainer Spang,et al.  Finding disease specific alterations in the co-expression of genes , 2004, ISMB/ECCB.

[19]  M. Newton Large-Scale Simultaneous Hypothesis Testing: The Choice of a Null Hypothesis , 2008 .

[20]  T. W. Anderson An Introduction to Multivariate Statistical Analysis, 2nd Edition. , 1985 .

[21]  D. L. Hawkins Using U Statistics to Derive the Asymptotic Distribution of Fisher's Z Statistic , 1989 .

[22]  A. Fuente,et al.  From ‘differential expression’ to ‘differential networking’ – identification of dysfunctional regulatory networks in diseases , 2010 .

[23]  Weidong Liu Gaussian graphical model estimation with false discovery rate control , 2013, 1306.0976.

[24]  Weidong Liu,et al.  Adaptive Thresholding for Sparse Covariance Matrix Estimation , 2011, 1102.2237.

[25]  Jiashun Jin,et al.  Robustness and accuracy of methods for high dimensional data analysis based on Student's t‐statistic , 2010, 1001.3886.

[26]  Wenguang Sun,et al.  Oracle and Adaptive Compound Decision Rules for False Discovery Rate Control , 2007 .

[27]  Andrew N. Meltzoff,et al.  Socioeconomic status predicts hemispheric specialisation of the left inferior frontal gyrus in young children , 2008, NeuroImage.

[28]  Alfred O. Hero,et al.  High Throughput Screening of Co-Expressed Gene Pairs with Controlled False Discovery Rate (FDR) and Minimum Acceptable Strength (MAS) , 2005, J. Comput. Biol..

[29]  Q. Shao,et al.  Phase Transition and Regularized Bootstrap in Large Scale $t$-tests with False Discovery Rate Control , 2013, 1310.4371.

[30]  Wenguang Sun,et al.  Large‐scale multiple testing under dependence , 2009 .

[31]  Xing Qiu,et al.  Correlation Between Gene Expression Levels and Limitations of the Empirical Bayes Methodology for Finding Differentially Expressed Genes , 2005, Statistical applications in genetics and molecular biology.

[32]  J. Schmee An Introduction to Multivariate Statistical Analysis , 1986 .

[33]  A. G. de la Fuente From 'differential expression' to 'differential networking' - identification of dysfunctional regulatory networks in diseases. , 2010, Trends in genetics : TIG.

[34]  B. Efron Large-Scale Simultaneous Hypothesis Testing , 2004 .

[35]  B. Efron Correlation and Large-Scale Simultaneous Significance Testing , 2007 .

[36]  Robert M. Mallery Reverse Engineering Gene Networks with Microarray Data , 2003 .

[37]  Ji-Gang Zhang,et al.  Class-specific correlations of gene expressions: identification and their effects on clustering analyses. , 2008, American journal of human genetics.

[38]  E. Lander,et al.  Gene expression correlates of clinical prostate cancer behavior. , 2002, Cancer cell.