Estimating genome-wide regulatory activity from multi-omics data sets using mathematical optimization

BackgroundGene regulation is one of the most important cellular processes, indispensable for the adaptability of organisms and closely interlinked with several classes of pathogenesis and their progression. Elucidation of regulatory mechanisms can be approached by a multitude of experimental methods, yet integration of the resulting heterogeneous, large, and noisy data sets into comprehensive and tissue or disease-specific cellular models requires rigorous computational methods. Recently, several algorithms have been proposed which model genome-wide gene regulation as sets of (linear) equations over the activity and relationships of transcription factors, genes and other factors. Subsequent optimization finds those parameters that minimize the divergence of predicted and measured expression intensities. In various settings, these methods produced promising results in terms of estimating transcription factor activity and identifying key biomarkers for specific phenotypes. However, despite their common root in mathematical optimization, they vastly differ in the types of experimental data being integrated, the background knowledge necessary for their application, the granularity of their regulatory model, the concrete paradigm used for solving the optimization problem and the data sets used for evaluation.ResultsHere, we review five recent methods of this class in detail and compare them with respect to several key properties. Furthermore, we quantitatively compare the results of four of the presented methods based on publicly available data sets.ConclusionsThe results show that all methods seem to find biologically relevant information. However, we also observe that the mutual result overlaps are very low, which contradicts biological intuition. Our aim is to raise further awareness of the power of these methods, yet also to identify common shortcomings and necessary extensions enabling focused research on the critical points.

[1]  D. di Bernardo,et al.  How to infer gene networks from expression profiles , 2007, Molecular systems biology.

[2]  Atul J. Butte,et al.  Quantifying the relationship between co-expression, co-regulation and gene function , 2004, BMC Bioinformatics.

[3]  Christian A. Rees,et al.  Systematic variation in gene expression patterns in human cancer cell lines , 2000, Nature Genetics.

[4]  J. Weinstein,et al.  mRNA and microRNA Expression Profiles of the NCI-60 Integrated with Drug Activities , 2010, Molecular Cancer Therapeutics.

[5]  Chi-Ying F. Huang,et al.  miRTarBase: a database curates experimentally validated microRNA–target interactions , 2010, Nucleic Acids Res..

[6]  B. Cairns,et al.  The biology of chromatin remodeling complexes. , 2009, Annual review of biochemistry.

[7]  Yadong Wang,et al.  miR2Disease: a manually curated database for microRNA deregulation in human disease , 2008, Nucleic Acids Res..

[8]  M. Gerstein,et al.  Genomic analysis of regulatory network dynamics reveals large topological changes , 2004, Nature.

[9]  Charles E. Vejnar,et al.  miRmap: Comprehensive prediction of microRNA target repression strength , 2012, Nucleic acids research.

[10]  A. Bird,et al.  Epigenetic regulation of gene expression: how the genome integrates intrinsic and environmental signals , 2003, Nature Genetics.

[11]  Marcus Oswald,et al.  Estimating the activity of transcription factors by the effect on their target genes , 2014, Bioinform..

[12]  A. Brazma,et al.  Reuse of public genome-wide gene expression data , 2012, Nature Reviews Genetics.

[13]  Olga G. Troyanskaya,et al.  Nested effects models for high-dimensional phenotyping screens , 2007, ISMB/ECCB.

[14]  V. Govorun,et al.  Genome-scale analysis of DNA methylation in colorectal cancer using Infinium HumanMethylation450 BeadChips , 2013, Epigenetics.

[15]  Olivier Elemento,et al.  Fast and systematic genome-wide discovery of conserved regulatory elements using a non-alignment based approach , 2005, Genome Biology.

[16]  Pedro Mendes,et al.  Artificial gene networks for objective comparison of analysis algorithms , 2003, ECCB.

[17]  P. Geurts,et al.  Inferring Regulatory Networks from Expression Data Using Tree-Based Methods , 2010, PloS one.

[18]  Daniel R. Zerbino,et al.  Ensembl 2016 , 2015, Nucleic Acids Res..

[19]  Michel Sadelain,et al.  Safe harbours for the integration of new DNA in the human genome , 2011, Nature Reviews Cancer.

[20]  B. Frey,et al.  Using expression profiling data to identify human microRNA targets , 2007, Nature Methods.

[21]  Joseph K. Pickrell,et al.  False positive peaks in ChIP-seq and other sequencing-based functional assays caused by unannotated high copy number regions , 2011, Bioinform..

[22]  Jun S. Liu,et al.  Inference of transcriptional regulation in cancers , 2015, Proceedings of the National Academy of Sciences.

[23]  Martin Vingron,et al.  Predicting transcription factor affinities to DNA from a biophysical model , 2007, Bioinform..

[24]  Joshua M. Korn,et al.  Comprehensive genomic characterization defines human glioblastoma genes and core pathways , 2008, Nature.

[25]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[26]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[27]  E. George,et al.  APPROACHES FOR BAYESIAN VARIABLE SELECTION , 1997 .

[28]  Y. Tu,et al.  Gene Expression Profiling of B Cell Chronic Lymphocytic Leukemia Reveals a Homogeneous Phenotype Related to Memory B Cells , 2001, The Journal of experimental medicine.

[29]  Andrea Tannapfel,et al.  Quantitative TP73 Transcript Analysis in Hepatocellular Carcinomas , 2004, Clinical Cancer Research.

[30]  Benjamin J. Raphael,et al.  Genomic and epigenomic landscapes of adult de novo acute myeloid leukemia. , 2013, The New England journal of medicine.

[31]  Nicholas T. Ingolia,et al.  Mammalian microRNAs predominantly act to decrease target mRNA levels , 2010, Nature.

[32]  Pen-Hui Yin,et al.  Aberrant methylation of EDNRB and p16 genes in hepatocellular carcinoma (HCC) in Taiwan. , 2006, Oncology reports.

[33]  Andreas Krämer,et al.  Causal analysis approaches in Ingenuity Pathway Analysis , 2013, Bioinform..

[34]  S. Batalov,et al.  A gene atlas of the mouse and human protein-encoding transcriptomes. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[35]  Daniel Hernández-Lobato,et al.  Expectation Propagation for microarray data classification , 2010, Pattern Recognit. Lett..

[36]  David Z. Chen,et al.  Architecture of the human regulatory network derived from ENCODE data , 2012, Nature.

[37]  Holger Karas,et al.  TRANSFAC: a database on transcription factors and their DNA binding sites , 1996, Nucleic Acids Res..

[38]  Juan M. Vaquerizas,et al.  A census of human transcription factors: function, expression and evolution , 2009, Nature Reviews Genetics.

[39]  Brendan J. Frey,et al.  A compendium of RNA-binding motifs for decoding gene regulation , 2013, Nature.

[40]  Michael Hecker,et al.  Gene regulatory network inference: Data integration in dynamic models - A review , 2009, Biosyst..

[41]  R. Shoemaker The NCI60 human tumour cell line anticancer drug screen , 2006, Nature Reviews Cancer.

[42]  Diogo M. Camacho,et al.  Wisdom of crowds for robust gene network inference , 2012, Nature Methods.

[43]  Danielle M. Varda,et al.  A Network Perspective , 2009 .

[44]  Mariza de Andrade,et al.  The Prevalence of BRCA2 Mutations in Familial Pancreatic Cancer , 2007, Cancer Epidemiology Biomarkers & Prevention.

[45]  Zhaolei Zhang,et al.  Regression Analysis of Combined Gene Expression Regulation in Acute Myeloid Leukemia , 2014, PLoS Comput. Biol..

[46]  P. Kaldis,et al.  The Complex Relationship between Liver Cancer and the Cell Cycle: A Story of Multiple Regulations , 2014, Cancers.

[47]  Holger Fröhlich,et al.  Joint Bayesian inference of condition-specific miRNA and transcription factor activities from combined gene and microRNA expression data , 2012, Bioinform..

[48]  Vilma Oliveira Frick,et al.  Chemokine expression in hepatocellular carcinoma versus colorectal liver metastases. , 2006, World journal of gastroenterology.

[49]  Jason B. Ernst,et al.  Integrating multiple evidence sources to predict transcription factor binding in the human genome. , 2010, Genome research.

[50]  Joshua M. Stuart,et al.  The Cancer Genome Atlas Pan-Cancer analysis project , 2013, Nature Genetics.

[51]  Francesco Falciani,et al.  Multilevel functional genomics data integration as a tool for understanding physiology: a network biology perspective. , 2016, Journal of applied physiology.

[52]  Y. Ji,et al.  The Inhibition of Src Family Kinase Suppresses Pancreatic Cancer Cell Proliferation, Migration, and Invasion , 2014, Pancreas.

[53]  F. Slack,et al.  Oncomirs — microRNAs with a role in cancer , 2006, Nature Reviews Cancer.

[54]  Korbinian Strimmer,et al.  From correlation to causation networks: a simple approximate learning algorithm and its application to high-dimensional plant gene expression data , 2007, BMC Systems Biology.

[55]  S Fuhrman,et al.  Reveal, a general reverse engineering algorithm for inference of genetic network architectures. , 1998, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[56]  Bertram Klinger,et al.  Computer-assisted curation of a human regulatory core network from the biological literature , 2015, Bioinform..

[57]  T. Hubbard,et al.  A census of human cancer genes , 2004, Nature Reviews Cancer.

[58]  Stijn van Dongen,et al.  miRBase: microRNA sequences, targets and gene nomenclature , 2005, Nucleic Acids Res..

[59]  Yen-Yi Ho,et al.  The Candidate Cancer Gene Database: a database of cancer driver genes from forward genetic screens in mice , 2014, Nucleic Acids Res..

[60]  David C Whitcomb,et al.  Role of BRCA1 and BRCA2 mutations in pancreatic cancer , 2006, Gut.

[61]  Mathisca C. M. de Gunst,et al.  Identification of context-specific gene regulatory networks with GEMULA - gene expression modeling using LAsso , 2012, Bioinform..

[62]  Piotr J. Balwierz,et al.  ISMARA: automated modeling of genomic signals as a democracy of regulatory motifs , 2014, Genome research.

[63]  E. Furlong,et al.  Transcription factors: from enhancer binding to developmental control , 2012, Nature Reviews Genetics.

[64]  Rainer Spang,et al.  Inferring cellular networks – a review , 2007, BMC Bioinformatics.

[65]  Peilin Jia,et al.  Investigating microRNA-transcription factor mediated regulatory network in glioblastoma , 2010, 2010 IEEE International Conference on Bioinformatics and Biomedicine Workshops (BIBMW).

[66]  N. Rajewsky microRNA target predictions in animals , 2006, Nature Genetics.

[67]  R. Agami,et al.  MicroRNA regulation by RNA-binding proteins and its implications for cancer , 2011, Nature Reviews Cancer.

[68]  D. Schadendorf,et al.  Metastatic potential of melanomas defined by specific gene expression profiles with no BRAF signature. , 2006, Pigment cell research.

[69]  E. Birney,et al.  Mapping identifiers for the integration of genomic datasets with the R/Bioconductor package biomaRt , 2009, Nature Protocols.

[70]  Thomas M. Cover,et al.  Elements of Information Theory , 2005 .

[71]  Avi Ma'ayan,et al.  ChEA: transcription factor regulation inferred from integrating genome-wide ChIP-X experiments , 2010, Bioinform..

[72]  Hanfei Sun,et al.  Target analysis by integration of transcriptome and ChIP-seq data with BETA , 2013, Nature Protocols.

[73]  K. Kinzler,et al.  Cancer Genome Landscapes , 2013, Science.

[74]  Ulf Leser,et al.  Cuneiform: a Functional Language for Large Scale Scientific Data Analysis , 2015, EDBT/ICDT Workshops.

[75]  Kevin Y. Yip,et al.  Whole-genome bisulfite sequencing of multiple individuals reveals complementary roles of promoter and gene body methylation in transcriptional regulation , 2014, Genome Biology.

[76]  M. Mayo,et al.  The transcription factor NF-kappaB: control of oncogenesis and cancer therapy resistance. , 2000, Biochimica et biophysica acta.

[77]  Chaoyang Zhang,et al.  Comparison of probabilistic Boolean network and dynamic Bayesian network approaches for inferring gene regulatory networks , 2007, BMC Bioinformatics.

[78]  Wei Xiong,et al.  Zbtb7 suppresses the expression of CDK2 and E2F4 in liver cancer cells: implications for the role of Zbtb7 in cell cycle regulation. , 2012, Molecular medicine reports.

[79]  R. Tjian,et al.  Orchestrated response: a symphony of transcription factors for gene control. , 2000, Genes & development.

[80]  Holger Fröhlich,et al.  biRte: Bayesian inference of context-specific regulator activities and transcriptional networks , 2015, Bioinform..

[81]  Mengchao Wu,et al.  Roles of Chemokine Receptor 4 (CXCR4) and Chemokine Ligand 12 (CXCL12) in Metastasis of Hepatocellular Carcinoma Cells , 2008, Cellular and Molecular Immunology.

[82]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[83]  Alex E. Lash,et al.  Gene Expression Omnibus: NCBI gene expression and hybridization array data repository , 2002, Nucleic Acids Res..

[84]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[85]  Jim Dowling,et al.  SAASFEE: Scalable Scientific Workflow Execution Engine , 2015, Proc. VLDB Endow..