Prioritize Transcription Factor Binding Sites for Multiple Co-Expressed Gene Sets Based on Lasso Multinomial Regression Models

Computational prediction of cis-regulatory elements for a set of co-expressed genes based on sequence analysis provides an overwhelming volume of potential transcription factor binding sites. It presents a challenge to prioritize a set of functional transcription factors and their binding sites on the regulatory regions of the target genes that are relevant to the gene expression study. A novel approach based on the use of lasso multinomial regression models is proposed to address this problem. We examine the ability of the lasso models using a time-course microarray data obtained from a comprehensive study of gene expression profiles in skin and mucosal in mouse over all stages of wound healing.

[1]  K Frech,et al.  Specific modelling of regulatory units in DNA sequences. , 1997, Pacific Symposium on Biocomputing. Pacific Symposium on Biocomputing.

[2]  M. Yuan,et al.  Model selection and estimation in regression with grouped variables , 2006 .

[3]  G. Church,et al.  Computational identification of transcription factor binding sites via a transcription-factor-centric clustering (TFCC) algorithm. , 2002, Journal of molecular biology.

[4]  Edmund J. Crampin,et al.  Predictive modelling of gene expression from transcriptional regulatory elements , 2015, Briefings Bioinform..

[5]  Marcin Piechota,et al.  Identification of cis-Regulatory Elements in the Mammalian Genome: The cREMaG Database , 2010, PloS one.

[6]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[7]  Gary D. Stormo,et al.  DNA binding sites: representation and discovery , 2000, Bioinform..

[8]  S. Rhodes,et al.  Analysis of the human LHX3 neuroendocrine transcription factor gene and mapping to the subtelomeric region of chromosome 9. , 2000, Gene.

[9]  Lars Juhl Jensen,et al.  Automatic discovery of regulatory patterns in promoter regions based on whole cell expression data and functional annotation , 2000, Bioinform..

[10]  H. Bussemaker,et al.  Regulatory element detection using correlation with expression , 2001, Nature Genetics.

[11]  P. V. von Hippel,et al.  Selection of DNA binding sites by regulatory proteins. Statistical-mechanical theory and application to operators and promoters. , 1987, Journal of molecular biology.

[12]  U. Alon Network motifs: theory and experimental approaches , 2007, Nature Reviews Genetics.

[13]  M. Covic,et al.  Protein arginine methyltransferase 1 coactivates NF-kappaB-dependent gene expression synergistically with CARM1 and PARP1. , 2008, Journal of molecular biology.

[14]  Andreas Wagner,et al.  Genes regulated cooperatively by one or more transcription factors and their identification in whole eukaryotic genomes , 1999, Bioinform..

[15]  John Lygeros,et al.  Stochastic dynamics of genetic networks: modelling and parameter identification , 2008, Bioinform..

[16]  Raymond K. Auerbach,et al.  An Integrated Encyclopedia of DNA Elements in the Human Genome , 2012, Nature.

[17]  Satoru Miyano,et al.  Statistical inference of transcriptional module-based gene networks from time course gene expression profiles by using state space models , 2008, Bioinform..

[18]  D. Botstein,et al.  Cluster analysis and display of genome-wide expression patterns. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[19]  David J. Arenillas,et al.  JASPAR 2014: an extensively expanded and updated open-access database of transcription factor binding profiles , 2013, Nucleic Acids Res..

[20]  R. Tibshirani Regression Shrinkage and Selection via the Lasso , 1996 .

[21]  Wyeth W. Wasserman,et al.  JASPAR: an open-access database for eukaryotic transcription factor binding profiles , 2004, Nucleic Acids Res..

[22]  F. Collins,et al.  A vision for the future of genomics research , 2003, Nature.

[23]  T. Werner,et al.  Highly specific localization of promoter regions in large genomic sequences by PromoterInspector: a novel context analysis approach. , 2000, Journal of molecular biology.

[24]  Xuegong Zhang,et al.  Gene-set analysis identifies master transcription factors in developmental courses. , 2009, Genomics.

[25]  Lennart Ljung,et al.  System Identification: Theory for the User , 1987 .

[26]  Wei Wu,et al.  LOGOS: a modular Bayesian model for de novo motif detection , 2003, Computational Systems Bioinformatics. CSB2003. Proceedings of the 2003 IEEE Bioinformatics Conference. CSB2003.

[27]  Z. Arbieva,et al.  Positional differences in the wound transcriptome of skin and oral mucosa , 2010, BMC Genomics.

[28]  Wei-Po Lee,et al.  A clustering-based approach for inferring recurrent neural networks as gene regulatory networks , 2008, Neurocomputing.

[29]  Patrik D'haeseleer,et al.  Genetic network inference: from co-expression clustering to reverse engineering , 2000, Bioinform..

[30]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[31]  Aris Floratos,et al.  Combinatorial pattern discovery in biological sequences: The TEIRESIAS algorithm [published erratum appears in Bioinformatics 1998;14(2): 229] , 1998, Bioinform..

[32]  Rolf Backofen,et al.  Feature Based Representation and Detection of Transcription Factor Binding Sites , 2004, German Conference on Bioinformatics.

[33]  Michael B. Eisen,et al.  Identification of regulatory elements using a feature selection method , 2002, Bioinform..

[34]  A. Mortazavi,et al.  Genome-Wide Mapping of in Vivo Protein-DNA Interactions , 2007, Science.

[35]  Min Zou,et al.  A new dynamic Bayesian network (DBN) approach for identifying gene regulatory networks from time course microarray data , 2005, Bioinform..

[36]  Lawrence Carin,et al.  Sparse multinomial logistic regression: fast algorithms and generalization bounds , 2005, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[37]  Edgar Wingender,et al.  The TRANSFAC project as an example of framework technology that supports the analysis of genomic regulation , 2008, Briefings Bioinform..

[38]  Michael Q. Zhang,et al.  DNA motifs in human and mouse proximal promoters predict tissue-specific expression. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[39]  Michael B. Stadler,et al.  Identification of active regulatory regions from DNA methylation data , 2013, Nucleic acids research.

[40]  Ting Wang,et al.  Combining phylogenetic data with co-regulated genes to identify regulatory motifs , 2003, Bioinform..

[41]  Zalmiyah Zakaria,et al.  A review on the computational approaches for gene regulatory network construction , 2014, Comput. Biol. Medicine.

[42]  R. Laubenbacher,et al.  A computational algebra approach to the reverse engineering of gene regulatory networks. , 2003, Journal of theoretical biology.

[43]  Yuh-Jyh Hu,et al.  Finding subtle motifs with variable gaps in unaligned DNA sequences , 2003, Comput. Methods Programs Biomed..

[44]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[45]  Marc D. Perry,et al.  ChIP-seq guidelines and practices of the ENCODE and modENCODE consortia , 2012, Genome research.

[46]  R. Altman,et al.  Whole-genome expression analysis: challenges beyond clustering. , 2001, Current opinion in structural biology.

[47]  D. Botstein,et al.  Genomic binding sites of the yeast cell-cycle transcription factors SBF and MBF , 2001, Nature.

[48]  Michael Q. Zhang,et al.  Adaptively inferring human transcriptional subnetworks , 2006, Molecular systems biology.

[49]  I. Chowers,et al.  Identification of regulatory targets of tissue-specific transcription factors: application to retina-specific gene regulation , 2005, Nucleic acids research.

[50]  Martha L. Bulyk,et al.  Meta-analysis discovery of tissue-specific DNA sequence motifs from mammalian gene expression data , 2006, BMC Bioinformatics.

[51]  Satoru Miyano,et al.  Inferring gene networks from time series microarray data using dynamic Bayesian networks , 2003, Briefings Bioinform..

[52]  D. Zack,et al.  Computational analysis of tissue-specific combinatorial gene regulation: predicting interaction between transcription factors in human tissues , 2006, Nucleic acids research.

[53]  R. Tjian,et al.  Transcription regulation and animal diversity , 2003, Nature.

[54]  M Wahde,et al.  Coarse-grained reverse engineering of genetic regulatory networks. , 2000, Bio Systems.

[55]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[56]  Hong Hu,et al.  A Model-based approach to transcription regulatory network reconstruction from time-course gene expression data , 2014, 2014 36th Annual International Conference of the IEEE Engineering in Medicine and Biology Society.

[57]  Saurabh Sinha,et al.  On counting position weight matrix matches in a sequence, with application to discriminative motif finding , 2006, ISMB.

[58]  D. Latchman Transcription factors: an overview. , 1997, The international journal of biochemistry & cell biology.

[59]  Terence P. Speed,et al.  Finding Short DNA Motifs Using Permuted Markov Models , 2005, J. Comput. Biol..

[60]  Gary D. Stormo,et al.  Identifying target sites for cooperatively binding factors , 2001, Bioinform..

[61]  T. D. Schneider,et al.  Use of the 'Perceptron' algorithm to distinguish translational initiation sites in E. coli. , 1982, Nucleic acids research.

[62]  Armin Shmilovici,et al.  Identification of transcription factor binding sites with variable-order Bayesian networks , 2005, Bioinform..

[63]  P. Bickel,et al.  Detecting DNA regulatory motifs by incorporating positional trends in information content , 2004, Genome Biology.

[64]  Jesper Tegnér,et al.  Reverse engineering gene networks using singular value decomposition and robust regression , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[65]  Marcia Villasana,et al.  Institutions as Enablers of Science-Based Industries: The Case of Biotechnology in Mexico , 2017 .

[66]  Nathan C. Sheffield,et al.  Predicting cell-type–specific gene expression from regions of open chromatin , 2012, Genome research.

[67]  M. Gerstein,et al.  A method using active‐site sequence conservation to find functional shifts in protein families: Application to the enzymes of central metabolism, leading to the identification of an anomalous isocitrate dehydrogenase in pathogens , 2004, Proteins.

[68]  Anirvan M. Sengupta,et al.  Non-additivity in protein-DNA binding , 2005, Bioinform..

[69]  Mikael Bodén,et al.  MEME Suite: tools for motif discovery and searching , 2009, Nucleic Acids Res..

[70]  Data production leads,et al.  An integrated encyclopedia of DNA elements in the human genome , 2012 .

[71]  H. Bussemaker,et al.  Building a dictionary for genomes: identification of presumptive regulatory sites by statistical analysis. , 2000, Proceedings of the National Academy of Sciences of the United States of America.

[72]  Trevor Hastie,et al.  Regularization Paths for Generalized Linear Models via Coordinate Descent. , 2010, Journal of statistical software.

[73]  M. Fishbein,et al.  p16 Protein and Gigaxonin Are Associated with the Ubiquitination of NFκB in Cisplatin-induced Senescence of Cancer Cells* , 2014, The Journal of Biological Chemistry.

[74]  T. Werner Models for prediction and recognition of eukaryotic promoters , 1999, Mammalian Genome.

[75]  Kenta Nakai,et al.  Pseudocounts for transcription factor binding sites , 2008, Nucleic acids research.

[76]  Wei-Po Lee,et al.  Computational methods for discovering gene networks from expression data , 2009, Briefings Bioinform..

[77]  Michael Ruogu Zhang,et al.  Comprehensive identification of cell cycle-regulated genes of the yeast Saccharomyces cerevisiae by microarray hybridization. , 1998, Molecular biology of the cell.

[78]  Inge Jonassen,et al.  Efficient discovery of conserved patterns using a pattern graph , 1997, Comput. Appl. Biosci..

[79]  Sheng Zhong,et al.  Inferring gene regulatory networks by thermodynamic modeling , 2008, BMC Genomics.

[80]  V. Thackray Fox tales: Regulation of gonadotropin gene expression by forkhead transcription factors , 2014, Molecular and Cellular Endocrinology.

[81]  Alexander E. Kel,et al.  TRANSFAC®: transcriptional regulation, from patterns to profiles , 2003, Nucleic Acids Res..

[82]  J. Liu,et al.  Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. , 2001, Nucleic acids research.

[83]  Shane T. Jensen,et al.  Computational Discovery of Gene Regulatory Binding Motifs: A Bayesian Perspective , 2004 .

[84]  T. Okamoto,et al.  Identification of a Novel Inhibitor of Nuclear Factor-κB, RelA-associated Inhibitor* , 1999, The Journal of Biological Chemistry.

[85]  Zbigniew Michalewicz,et al.  Genetic Algorithms + Data Structures = Evolution Programs , 1996, Springer Berlin Heidelberg.

[86]  Hidde de Jong,et al.  Modeling and Simulation of Genetic Regulatory Systems: A Literature Review , 2002, J. Comput. Biol..

[87]  Donald Geman,et al.  The Limits of De Novo DNA Motif Discovery , 2012, PloS one.

[88]  Judith Marsman,et al.  Long distance relationships: enhancer-promoter communication and dynamic gene transcription. , 2012, Biochimica et biophysica acta.

[89]  Esko Ukkonen,et al.  Data Mining for Regulatory Elements in Yeast Genome , 1997, ISMB.

[90]  J. Vohradský Neural network model of gene expression , 2001, FASEB journal : official publication of the Federation of American Societies for Experimental Biology.

[91]  Mark Rebeiz,et al.  SCORE: A computational approach to the identification of cis-regulatory modules and target genes in whole-genome sequence data , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[92]  J. Collado-Vides,et al.  Extracting regulatory sites from the upstream region of yeast genes by computational analysis of oligonucleotide frequencies. , 1998, Journal of molecular biology.

[93]  G. K. Sandve,et al.  A survey of motif discovery methods in an integrated framework , 2006, Biology Direct.

[94]  G. Nolan,et al.  DNA binding and IκB inhibition of the cloned p65 subunit of NF-κB, a rel-related polypeptide , 1991, Cell.

[95]  Edward R. Dougherty,et al.  Probabilistic Boolean networks: a rule-based uncertainty model for gene regulatory networks , 2002, Bioinform..

[96]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[97]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[98]  Jun S. Liu,et al.  Integrating regulatory motif discovery and genome-wide expression analysis , 2003, Proceedings of the National Academy of Sciences of the United States of America.