A two-stage approach of gene network analysis for high-dimensional heterogeneous data.

Gaussian graphical models have been widely used to construct gene regulatory networks from gene expression data. Most existing methods for Gaussian graphical models are designed to model homogeneous data, assuming a single Gaussian distribution. In practice, however, data may consist of gene expression studies with unknown confounding factors, such as study cohort, microarray platforms, experimental batches, which produce heterogeneous data, and hence lead to false positive edges or low detection power in resulting network, due to those unknown factors. To overcome this problem and improve the performance in constructing gene networks, we propose a two-stage approach to construct a gene network from heterogeneous data. The first stage is to perform a clustering analysis in order to assign samples to a few clusters where the samples in each cluster are approximately homogeneous, and the second stage is to conduct an integrative analysis of networks from each cluster. In particular, we first apply a model-based clustering method using the singular value decomposition for high-dimensional data, and then integrate the networks from each cluster using the integrative $\psi$-learning method. The proposed method is based on an equivalent measure of partial correlation coefficients in Gaussian graphical models, which is computed with a reduced conditional set and thus it is useful for high-dimensional data. We compare the proposed two-stage learning approach with some existing methods in various simulation settings, and demonstrate the robustness of the proposed method. Finally, it is applied to integrate multiple gene expression studies of lung adenocarcinoma to identify potential therapeutic targets and treatment biomarkers.

[1]  Junzhou Huang,et al.  Comprehensive Computational Pathological Image Analysis Predicts Lung Cancer Prognosis , 2017, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[2]  Alicia N. Schep,et al.  Nfib Promotes Metastasis through a Widespread Increase in Chromatin Accessibility , 2016, Cell.

[3]  Lourens J. Waldorp,et al.  A focused information criterion for graphical models , 2014, Statistics and Computing.

[4]  Guanghua Xiao,et al.  Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks. , 2015, Biostatistics.

[5]  Faming Liang,et al.  An Equivalent Measure of Partial Correlation Coefficients for High-Dimensional Gaussian Graphical Models , 2015 .

[6]  Gwénaël Le Teuff,et al.  Subtype Classification of Lung Adenocarcinoma Predicts Benefit From Adjuvant Chemotherapy in Patients Undergoing Complete Resection. , 2015, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[7]  Yang Song,et al.  Fructose-Bisphosphate Aldolase A Is a Potential Metastasis-Associated Marker of Lung Squamous Cell Carcinoma and Promotes Lung Cell Tumorigenesis and Migration , 2014, PloS one.

[8]  Genevera I. Allen,et al.  A Local Poisson Graphical Model for Inferring Networks From Sequencing Data , 2013, IEEE Transactions on NanoBioscience.

[9]  Ernst Wit,et al.  High dimensional Sparse Gaussian Graphical Mixture Model , 2013, ArXiv.

[10]  D. Perrotti,et al.  Protein phosphatase 2A: a target for anticancer therapy. , 2013, The Lancet. Oncology.

[11]  Noah Simon,et al.  A Sparse-Group Lasso , 2013 .

[12]  Hongzhe Li,et al.  Covariate-Adjusted Precision Matrix Estimation with an Application in Genetical Genomics. , 2013, Biometrika.

[13]  Milind B. Suraokar,et al.  A 12-Gene Set Predicts Survival Benefits from Adjuvant Chemotherapy in Non–Small Cell Lung Cancer Patients , 2013, Clinical Cancer Research.

[14]  Sach Mukherjee,et al.  Network-based clustering with mixtures of L1-penalized Gaussian graphical models: an empirical investigation , 2013, ArXiv.

[15]  Yifeng Zhou,et al.  A functional copy-number variation in MAPKAPK2 predicts risk and prognosis of lung cancer. , 2012, American journal of human genetics.

[16]  Jian Huang,et al.  A Selective Review of Group Selection in High-Dimensional Models. , 2012, Statistical science : a review journal of the Institute of Mathematical Statistics.

[17]  Min Chen,et al.  Comparing Statistical Methods for Constructing Large Scale Gene Networks , 2012, PloS one.

[18]  Patrick Danaher,et al.  The joint graphical lasso for inverse covariance estimation across multiple classes , 2011, Journal of the Royal Statistical Society. Series B, Statistical methodology.

[19]  Zoe Wainer,et al.  Does Lung Adenocarcinoma Subtype Predict Patient Survival?: A Clinicopathologic Study Based on the New International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society International Multidisciplinary Lung Adenocarcinoma Classification , 2011, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[20]  Sara van de Geer,et al.  Statistics for High-Dimensional Data: Methods, Theory and Applications , 2011 .

[21]  E. Levina,et al.  Joint estimation of multiple graphical models. , 2011, Biometrika.

[22]  Masahiro Tsuboi,et al.  International Association for the Study of Lung Cancer/American Thoracic Society/European Respiratory Society International Multidisciplinary Classification of Lung Adenocarcinoma , 2011, Journal of thoracic oncology : official publication of the International Association for the Study of Lung Cancer.

[23]  Akihiko Yoshizawa,et al.  A Grading System of Lung Adenocarcinomas Based on Histologic Pattern is Predictive of Disease Recurrence in Stage I Tumors , 2010, The American journal of surgical pathology.

[24]  J. Lafferty,et al.  High-dimensional Ising model selection using ℓ1-regularized logistic regression , 2010, 1010.0311.

[25]  F. Liang,et al.  Estimating the false discovery rate using the stochastic approximation algorithm , 2008 .

[26]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[27]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[28]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[29]  A. Jemal,et al.  Cancer Statistics, 2008 , 2008, CA: a cancer journal for clinicians.

[30]  Faming Liang,et al.  Use of SVD-based probit transformation in clustering gene expression profiles , 2007, Comput. Stat. Data Anal..

[31]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[32]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[33]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[34]  Joshua M. Stuart,et al.  A Gene-Coexpression Network for Global Discovery of Conserved Genetic Modules , 2003, Science.

[35]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[36]  Y. Ishikawa,et al.  Early-Stage Lung Adenocarcinomas With a Micropapillary Pattern, a Distinct Pathologic Marker for a Significantly Poor Prognosis , 2003, The American journal of surgical pathology.

[37]  John D. Storey A direct approach to false discovery rates , 2002 .

[38]  K. Ahmed,et al.  Joining the cell survival squad: an emerging role for protein kinase CK2. , 2002, Trends in cell biology.

[39]  J. Rainer A Clinical and Genetico-Statistical Study of Schizophrenia and Low-Grade Mental Deficiency in a Large Swedish Rural Population , 1960 .

[40]  Sara van de Geer,et al.  Statistics for High-Dimensional Data , 2011 .

[41]  C. O. A. D. P. R. M. A. E. Stimation Covariate Adjusted Precision Matrix Estimation with an Application in Genetical Genomics , 2011 .

[42]  Michael I. Jordan Graphical Models , 2003 .

[43]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[44]  B. Coiffier,et al.  Consensus Conference on Intensive Chemotherapy Plus Hematopoietic Stem-Cell Transplantation in Malignancies: Lyon, France, June 4-6, 1993. , 1994, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.