Inference for high-dimensional differential correlation matrices

Motivated by differential co-expression analysis in genomics, we consider in this paper estimation and testing of high-dimensional differential correlation matrices. An adaptive thresholding procedure is introduced and theoretical guarantees are given. Minimax rate of convergence is established and the proposed estimator is shown to be adaptively rate-optimal over collections of paired correlation matrices with approximately sparse differences. Simulation results show that the procedure significantly outperforms two other natural methods that are based on separate estimation of the individual correlation matrices. The procedure is also illustrated through an analysis of a breast cancer dataset, which provides evidence at the gene co-expression level that several genes, of which a subset has been previously verified, are associated with the breast cancer. Hypothesis testing on the differential correlation matrices is also considered. A test, which is particularly well suited for testing against sparse alternatives, is introduced. In addition, other related problems, including estimation of a single sparse correlation matrix, estimation of the differential covariance matrices, and estimation of the differential cross-correlation matrices, are also discussed.

[1]  Michael Griffin,et al.  Gene co-expression network topology provides a framework for molecular characterization of cellular state , 2004, Bioinform..

[2]  Ji-Gang Zhang,et al.  Class-specific correlations of gene expressions: identification and their effects on clustering analyses. , 2008, American journal of human genetics.

[3]  Yudong D. He,et al.  A Gene-Expression Signature as a Predictor of Survival in Breast Cancer , 2002 .

[4]  Lianhua Yin,et al.  Effect of estrogen sulfation by SULT1E1 and PAPSS on the development of estrogen‐dependent cancers , 2012, Cancer science.

[5]  T. Cai,et al.  Two-Sample Covariance Matrix Testing and Support Recovery in High-Dimensional and Sparse Settings , 2013 .

[6]  T. Ideker,et al.  Differential network biology , 2012, Molecular systems biology.

[7]  Veronique De Bruyne,et al.  Methods for microarray data analysis. , 2007, Methods in molecular biology.

[8]  S. Horvath,et al.  Weighted gene coexpression network analysis strategies applied to mouse weight , 2007, Mammalian Genome.

[9]  O. Olopade,et al.  Regulation of Tcf7l1 DNA binding and protein stability as principal mechanisms of Wnt/β-catenin signaling. , 2013, Cell reports.

[10]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[11]  P. Bickel,et al.  Covariance regularization by thresholding , 2009, 0901.3079.

[12]  M. Yuan,et al.  Adaptive covariance matrix estimation through block thresholding , 2012, 1211.0459.

[13]  Hirokazu Yanagihara,et al.  Testing the equality of several covariance matrices with fewer observations than the dimension , 2010, J. Multivar. Anal..

[14]  A. G. de la Fuente From 'differential expression' to 'differential networking' - identification of dysfunctional regulatory networks in diseases. , 2010, Trends in genetics : TIG.

[15]  Antonio Reverter,et al.  A Differential Wiring Analysis of Expression Data Correctly Identifies the Gene Containing the Causal Mutation , 2009, PLoS Comput. Biol..

[16]  Harrison H. Zhou,et al.  Optimal rates of convergence for covariance matrix estimation , 2010, 1010.3866.

[17]  P. Deloukas,et al.  Multiple common variants for celiac disease influencing immune gene expression , 2010, Nature Genetics.

[19]  Weidong Liu,et al.  Adaptive Thresholding for Sparse Covariance Matrix Estimation , 2011, 1102.2237.

[20]  Jun Yu Li,et al.  Two Sample Tests for High Dimensional Covariance Matrices , 2012, 1206.0917.

[21]  James R. Schott,et al.  A test for the equality of covariance matrices when the dimension is large relative to the sample sizes , 2007, Comput. Stat. Data Anal..

[22]  S. Kumar,et al.  Prognostic significance of TGF beta 1 and TGF beta 3 in human breast carcinoma. , 2000, Anticancer research.

[23]  Kerby Shedden,et al.  Differential Correlation Detects Complex Associations Between Gene Expression and Clinical Outcomes in Lung Adenocarcinomas , 2005 .

[24]  Harrison H. Zhou,et al.  OPTIMAL RATES OF CONVERGENCE FOR SPARSE COVARIANCE MATRIX ESTIMATION , 2012, 1302.3030.

[25]  Glen Kristiansen,et al.  Systematic characterisation of GABRP expression in sporadic breast cancer and normal breast tissue , 2006, International journal of cancer.

[26]  Homin K. Lee,et al.  Coexpression analysis of human genes across many microarray data sets. , 2004, Genome research.

[27]  Irina Klaman,et al.  Loss of SFRP1 is associated with breast cancer progression and poor prognosis in early stage tumors. , 2004, International journal of oncology.

[28]  A. Fukushima DiffCorr: an R package to analyze and visualize differential correlations in biological networks. , 2013, Gene.

[29]  Javier Dotor,et al.  GDF5 Regulates TGFß-Dependent Angiogenesis in Breast Carcinoma MCF-7 Cells: In Vitro and In Vivo Control by Anti-TGFß Peptides , 2012, PloS one.

[30]  Rainer Spang,et al.  Finding disease specific alterations in the co-expression of genes , 2004, ISMB/ECCB.

[31]  Sourav Bandyopadhyay,et al.  Rewiring of Genetic Networks in Response to DNA Damage , 2010, Science.

[32]  Liang Chen,et al.  A statistical method for identifying differential gene-gene co-expression patterns , 2004, Bioinform..

[33]  I. Johnstone,et al.  Minimax estimation via wavelet shrinkage , 1998 .

[34]  A. Barabasi,et al.  Network medicine : a network-based approach to human disease , 2010 .

[35]  Alan C. Evans,et al.  Intellectual ability and cortical development in children and adolescents , 2006, Nature.

[36]  Adam J. Rothman,et al.  Generalized Thresholding of Large Covariance Matrices , 2009 .