Enhanced construction of gene regulatory networks using hub gene information

BackgroundGene regulatory networks reveal how genes work together to carry out their biological functions. Reconstructions of gene networks from gene expression data greatly facilitate our understanding of underlying biological mechanisms and provide new opportunities for biomarker and drug discoveries. In gene networks, a gene that has many interactions with other genes is called a hub gene, which usually plays an essential role in gene regulation and biological processes. In this study, we developed a method for reconstructing gene networks using a partial correlation-based approach that incorporates prior information about hub genes. Through simulation studies and two real-data examples, we compare the performance in estimating the network structures between the existing methods and the proposed method.ResultsIn simulation studies, we show that the proposed strategy reduces errors in estimating network structures compared to the existing methods. When applied to Escherichia coli, the regulation network constructed by our proposed ESPACE method is more consistent with current biological knowledge than the SPACE method. Furthermore, application of the proposed method in lung cancer has identified hub genes whose mRNA expression predicts cancer progress and patient response to treatment.ConclusionsWe have demonstrated that incorporating hub gene information in estimating network structures can improve the performance of the existing methods.

[1]  Brian H. Dunford-Shore,et al.  Somatic mutations affect key pathways in lung adenocarcinoma , 2008, Nature.

[2]  P. Ja,et al.  Inference in Bayesian Networks , 1999, AI Mag..

[3]  A. Jemal,et al.  Cancer Statistics, 2010 , 2010, CA: a cancer journal for clinicians.

[4]  Pei Wang,et al.  Partial Correlation Estimation by Joint Sparse Regression Models , 2008, Journal of the American Statistical Association.

[5]  D. Zheleva,et al.  Aurora kinase inhibitors: Progress towards the clinic , 2012, Investigational New Drugs.

[6]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[7]  Alexandre d'Aspremont,et al.  Model Selection Through Sparse Max Likelihood Estimation Model Selection Through Sparse Maximum Likelihood Estimation for Multivariate Gaussian or Binary Data , 2022 .

[8]  K. Lange,et al.  Coordinate descent algorithms for lasso penalized regression , 2008, 0803.3876.

[9]  J. Friedman,et al.  New Insights and Faster Computations for the Graphical Lasso , 2011 .

[10]  N. Dubrawsky Cancer statistics , 1989, CA: a cancer journal for clinicians.

[11]  J. Collins,et al.  Large-Scale Mapping and Validation of Escherichia coli Transcriptional Regulation from a Compendium of Expression Profiles , 2007, PLoS biology.

[12]  Yaniv Ziv,et al.  Revealing modular organization in the yeast transcriptional network , 2002, Nature Genetics.

[13]  K. Sachs,et al.  Causal Protein-Signaling Networks Derived from Multiparameter Single-Cell Data , 2005, Science.

[14]  Yang Xie,et al.  Ensemble-Based Network Aggregation Improves the Accuracy of Gene Network Reconstruction , 2014, PloS one.

[15]  Faming Liang,et al.  Author's Personal Copy Computational Statistics and Data Analysis Learning Bayesian Networks for Discrete Data , 2022 .

[16]  Adam A. Margolin,et al.  Reverse engineering of regulatory networks in human B cells , 2005, Nature Genetics.

[17]  D. Pe’er,et al.  Module networks: identifying regulatory modules and their condition-specific regulators from gene expression data , 2003, Nature Genetics.

[18]  Qiang Liu,et al.  Learning Scale Free Networks by Reweighted L1 regularization , 2011, AISTATS.

[19]  Milind B. Suraokar,et al.  A 12-Gene Set Predicts Survival Benefits from Adjuvant Chemotherapy in Non–Small Cell Lung Cancer Patients , 2013, Clinical Cancer Research.

[20]  Xingming Zhao,et al.  Conditional mutual inclusive information enables accurate quantification of associations in gene regulatory networks , 2014, Nucleic acids research.

[21]  Igor Jurisica,et al.  Gene expression–based survival prediction in lung adenocarcinoma: a multi-site, blinded validation study , 2008, Nature Medicine.

[22]  Cathy H. Wu,et al.  Oncogenic fusion protein EWS-FLI1 is a network hub that regulates alternative splicing , 2015, Proceedings of the National Academy of Sciences.

[23]  D. Xie,et al.  RACK1, a versatile hub in cancer , 2014, Oncogene.

[24]  Carl Virtanen,et al.  Two prognostically significant subtypes of high-grade lung neuroendocrine tumours independent of small-cell and large-cell neuroendocrine carcinomas identified by gene expression profiles , 2004, The Lancet.

[25]  Steve Horvath,et al.  WGCNA: an R package for weighted correlation network analysis , 2008, BMC Bioinformatics.

[26]  Jeremiah J. Faith,et al.  Many Microbe Microarrays Database: uniformly normalized Affymetrix compendia with structured experimental metadata , 2007, Nucleic Acids Res..

[27]  Stan Pounds,et al.  Estimating the Occurrence of False Positives and False Negatives in Microarray Studies by Approximating and Partitioning the Empirical Distribution of P-values , 2003, Bioinform..

[28]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[29]  P. Spirtes,et al.  Causation, Prediction, and Search, 2nd Edition , 2001 .

[30]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[31]  M. Tyers,et al.  Stratus Not Altocumulus: A New View of the Yeast Protein Interaction Network , 2006, PLoS biology.

[32]  Julio Collado-Vides,et al.  RegulonDB v8.0: omics data sets, evolutionary conservation, regulatory phrases, cross-validated gold standards and more , 2012, Nucleic Acids Res..

[33]  N. Meinshausen,et al.  High-dimensional graphs and variable selection with the Lasso , 2006, math/0608017.

[34]  T. Jaakkola,et al.  Bayesian Network Approach to Cell Signaling Pathway Modeling , 2002, Science's STKE.

[35]  Peng Zhao,et al.  On Model Selection Consistency of Lasso , 2006, J. Mach. Learn. Res..

[36]  Wei Pan,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm612 Systems biology , 2022 .

[37]  Min Chen,et al.  Comparing Statistical Methods for Constructing Large Scale Gene Networks , 2012, PloS one.

[38]  R. Tibshirani,et al.  Sparse inverse covariance estimation with the graphical lasso. , 2008, Biostatistics.

[39]  Trevor J. Hastie,et al.  The Graphical Lasso: New Insights and Alternatives , 2011, Electronic journal of statistics.

[40]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[41]  M. Gerstein,et al.  Structure and evolution of transcriptional regulatory networks. , 2004, Current opinion in structural biology.

[42]  Wei Pan,et al.  Predictor Network in Penalized Regression with Application to Microarray Data” , 2009 .

[43]  Guanghua Xiao,et al.  Identifying CDKN3 Gene Expression as a Prognostic Biomarker in Lung Adenocarcinoma via Meta-analysis , 2015, Cancer informatics.

[44]  A. Elofsson,et al.  What properties characterize the hub proteins of the protein-protein interaction network of Saccharomyces cerevisiae? , 2006, Genome Biology.

[45]  Xing-Ming Zhao,et al.  Inferring gene regulatory networks from gene expression data by path consistency algorithm based on conditional mutual information , 2012, Bioinform..

[46]  M. Yuan,et al.  Model selection and estimation in the Gaussian graphical model , 2007 .

[47]  Jie Wu,et al.  Overexpression of major CDKN3 transcripts is associated with poor survival in lung adenocarcinoma , 2015, British Journal of Cancer.

[48]  K. Strimmer,et al.  Statistical Applications in Genetics and Molecular Biology A Shrinkage Approach to Large-Scale Covariance Matrix Estimation and Implications for Functional Genomics , 2011 .

[49]  Gianluca Bontempi,et al.  minet: A R/Bioconductor Package for Inferring Large Transcriptional Networks Using Mutual Information , 2008, BMC Bioinformatics.

[50]  Keitaro Matsuo,et al.  Relapse-related molecular signature in lung adenocarcinomas identifies patients with dismal prognosis. , 2009, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[51]  D. Pe’er,et al.  An Integrated Approach to Uncover Drivers of Cancer , 2010, Cell.

[52]  Guanghua Xiao,et al.  Statistical completion of a partially identified graph with applications for the estimation of gene regulatory networks. , 2015, Biostatistics.

[53]  D. Hunter,et al.  Optimization Transfer Using Surrogate Objective Functions , 2000 .

[54]  Sandhya Rani,et al.  Human Protein Reference Database—2009 update , 2008, Nucleic Acids Res..

[55]  Nir Friedman,et al.  Inferring Cellular Networks Using Probabilistic Graphical Models , 2004, Science.

[56]  Wei Pan,et al.  Network-based multiple locus linkage analysis of expression traits , 2009, Bioinform..

[57]  W. Wong,et al.  Learning Causal Bayesian Network Structures From Experimental Data , 2008 .

[58]  Tom Burr,et al.  Causation, Prediction, and Search , 2003, Technometrics.

[59]  Gregory F. Cooper,et al.  A Bayesian method for the induction of probabilistic networks from data , 1992, Machine Learning.