Estimating a common covariance matrix for network meta-analysis of gene expression datasets in diffuse large B-cell lymphoma

Estimating gene networks in combination with posthoc analysis based on data from malignant tissue is a major challenge in cancer systems biology as it allows us to improve our understanding of disease pathology and eventually identify new drug targets. Motivated by the need for improving the inherently unstable covariance estimation compounded by noisy gene expression data, we present a hierarchical random covariance model applied as a meta-analysis of gene networks across eleven large-scale gene expression studies of diffuse large B-cell lymphoma (DLBCL). The approach was inspired by traditional meta-analysis using random effects models and we derive and compare basic properties and estimators of the model. Simple inference and interpretation of an introduced parameter measuring the inter-class homogeneity is suggested. The methods are generally applicable where multiple classes are present and believed to share a common covariance matrix of interest that is obscured by class-dependent noise. As such, it provides a basis for meta- or integrative analysis of covariance matrices where the classes are formed by datasets. In a posthoc analysis of the estimated common covariance matrix for the DLBCL data we were able to identify biologically meaningful gene networks of prognostic value. Of particular interest was the identification of a network with the S100 family of calcium-binding proteins as central players which further fuels the indications that knock down of these proteins may improve the immunotherapy strategies and outcome of lymphoma patients.

[1]  Comparing Correlation Matrix Estimators Via Kullback-Leibler Divergence , 2011 .

[2]  R. Shao,et al.  Role of YKL-40 in the Angiogenesis, Radioresistance, and Progression of Glioblastoma* , 2011, The Journal of Biological Chemistry.

[3]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[4]  Yingying Fan,et al.  Tuning parameter selection in high dimensional penalized likelihood , 2013, 1605.03321.

[5]  Dirk Hasenclever,et al.  Translocations activating IRF4 identify a subtype of germinal center-derived B-cell lymphoma affecting predominantly children and young adults. , 2011, Blood.

[6]  R. Gascoyne,et al.  A novel method of amplification of FFPET-derived RNA enables accurate disease classification with microarrays. , 2010, The Journal of molecular diagnostics : JMD.

[7]  Stefano Monti,et al.  Integrative analysis reveals an outcome-associated and targetable pattern of p53 and cell cycle deregulation in diffuse large B cell lymphoma. , 2012, Cancer cell.

[8]  G. Smyth,et al.  Statistical Applications in Genetics and Molecular Biology Permutation P -values Should Never Be Zero: Calculating Exact P -values When Permutations Are Randomly Drawn , 2011 .

[9]  W. Nacken,et al.  Inhibition of dendritic cell differentiation and accumulation of myeloid-derived suppressor cells in cancer is regulated by S100A9 protein , 2008, The Journal of experimental medicine.

[10]  P. Olver Nonlinear Systems , 2013 .

[11]  D. Flynn,et al.  Wound Healing and Cancer Stem Cells: Inflammation as a Driver of Treatment Resistance in Breast Cancer , 2015, Cancer growth and metastasis.

[12]  R. Spang,et al.  A biologic definition of Burkitt's lymphoma from transcriptional and genomic profiling. , 2006, The New England journal of medicine.

[13]  Ricardo Otazo,et al.  Sensitivity‐encoded (SENSE) proton echo‐planar spectroscopic imaging (PEPSI) in the human brain , 2007, Magnetic resonance in medicine.

[14]  Wessel N. van Wieringen,et al.  Ridge estimation of inverse covariance matrices from high-dimensional data , 2014, Comput. Stat. Data Anal..

[15]  D. Rubin,et al.  Maximum likelihood from incomplete data via the EM - algorithm plus discussions on the paper , 1977 .

[16]  Rafael A Irizarry,et al.  Exploration, normalization, and summaries of high density oligonucleotide array probe level data. , 2003, Biostatistics.

[17]  R. Dalla‐Favera,et al.  Mutations of multiple genes cause deregulation of NF-κB in diffuse large B-cell lymphoma , 2009, Nature.

[18]  N. Laird,et al.  Meta-analysis in clinical trials. , 1986, Controlled clinical trials.

[19]  Dirk Eddelbuettel,et al.  Rcpp: Seamless R and C++ Integration , 2011 .

[20]  Sangsoo Kim,et al.  Combining multiple microarray studies and modeling interstudy variation , 2003, ISMB.

[21]  S. Horvath Weighted Network Analysis: Applications in Genomics and Systems Biology , 2011 .

[22]  H. Kong,et al.  Systematic evaluation of immune regulation and modulation , 2017, Journal of Immunotherapy for Cancer.

[23]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[24]  A. Butte,et al.  Discovering functional relationships between RNA expression and chemotherapeutic susceptibility using relevance networks. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[25]  Danny Holten,et al.  Hierarchical Edge Bundles: Visualization of Adjacency Relations in Hierarchical Data , 2006, IEEE Transactions on Visualization and Computer Graphics.

[26]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[27]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[28]  F. Ferrari,et al.  The Reconstruction of Transcriptional Networks Reveals Critical Genes with Implications for Clinical Outcome of Multiple Myeloma , 2011, Clinical Cancer Research.

[29]  Tal Galili,et al.  dendextend: an R package for visualizing, adjusting and comparing trees of hierarchical clustering , 2015, Bioinform..

[30]  L. O’Driscoll,et al.  Correlating transcriptional networks to breast cancer survival: a large-scale coexpression analysis. , 2013, Carcinogenesis.

[31]  Edward M. Reingold,et al.  Graph drawing by force‐directed placement , 1991, Softw. Pract. Exp..

[32]  Michael I. Jordan Graphical Models , 2003 .

[33]  Jeongyoun Ahn,et al.  Covariance adjustment for batch effect in gene expression data , 2014, Statistics in medicine.

[34]  L. Staudt,et al.  Molecular subtypes of diffuse large B-cell lymphoma arise by distinct genetic pathways , 2008, Proceedings of the National Academy of Sciences.

[35]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[36]  R. Cook,et al.  On the mean and variance of the generalized inverse of a singular Wishart matrix , 2011 .

[37]  J. Fleiss,et al.  Intraclass correlations: uses in assessing rater reliability. , 1979, Psychological bulletin.

[38]  Yihui Xie,et al.  Dynamic Documents with R and knitr , 2015 .

[39]  K. Dybkær,et al.  Comprehensive gene expression profiling and immunohistochemical studies support application of immunophenotypic algorithm for molecular subtype classification in diffuse large B-cell lymphoma: a report from the International DLBCL Rituximab-CHOP Consortium Program Study , 2012, Leukemia.

[40]  R. Myers,et al.  Evolving gene/transcript definitions significantly alter the interpretation of GeneChip data , 2005, Nucleic acids research.

[41]  Rajen Dinesh Shah,et al.  Diffuse large B-cell lymphoma classification system that associates normal B-cell subset phenotypes with prognosis. , 2015, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[42]  Hannah R Rothstein,et al.  A basic introduction to fixed‐effect and random‐effects models for meta‐analysis , 2010, Research synthesis methods.

[43]  James Olen Armitage,et al.  A clinical evaluation of the International Lymphoma Study Group classification of non-Hodgkin's lymphoma. The Non-Hodgkin's Lymphoma Classification Project. , 1997, Blood.

[44]  Rosen D Von On moments of the inverted Wishart distribution , 1997 .

[45]  Dereje D. Jima,et al.  Deep sequencing of the small RNA transcriptome of normal and malignant human B cells identifies hundreds of novel microRNAs. , 2010, Blood.

[46]  R. Sokal,et al.  THE COMPARISON OF DENDROGRAMS BY OBJECTIVE METHODS , 1962 .

[47]  Cheng Li,et al.  Adjusting batch effects in microarray expression data using empirical Bayes methods. , 2007, Biostatistics.

[48]  Hedi Peterson,et al.  g:Profiler—a web server for functional interpretation of gene lists (2016 update) , 2016, Nucleic Acids Res..

[49]  Benjamin M. Bolstad,et al.  affy - analysis of Affymetrix GeneChip data at the probe level , 2004, Bioinform..