Computationa 21. Computational Methods for Analysis of Transcriptional Regulation

Understanding the mechanisms of transcriptional regulation is a key step in understanding many biological processes. Many computational algorithms have been developed to tackle this problem by identifying (1) the binding motifs, (2) binding sites, and (3) regulatory targets of given transcription factors. In this chapter, we survey the scope of currently used methods and algorithms for solving each of the above subproblems. We also focus on the newer subarea of machine learning (ML) methods, which have introduced a framework for a new set of approaches to solving these problems. The connections between these machine learning algorithms and conventional position weight matrix (PWM)-based algorithms are also highlighted, with the suggestion that ML algorithms can often generalize and expand the capabilities of existing methods.

[1]  M. Gerstein,et al.  What is a gene, post-ENCODE? History and updated definition. , 2007, Genome research.

[2]  Ronald W. Davis,et al.  A high-resolution atlas of nucleosome occupancy in yeast , 2007, Nature Genetics.

[3]  M. Gerstein,et al.  Genomic analysis of gene expression relationships in transcriptional regulatory networks. , 2003, Trends in genetics : TIG.

[4]  Erik van Nimwegen,et al.  PhyloGibbs: A Gibbs Sampling Motif Finder That Incorporates Phylogeny , 2005, PLoS Comput. Biol..

[5]  Saurabh Sinha,et al.  YMF: a program for discovery of novel transcription factor binding sites by statistical overrepresentation , 2003, Nucleic Acids Res..

[6]  J. Shendure,et al.  Discovering functional transcription-factor combinations in the human cell cycle. , 2005, Genome research.

[7]  Chris Wiggins,et al.  ARACNE: An Algorithm for the Reconstruction of Gene Regulatory Networks in a Mammalian Cellular Context , 2004, BMC Bioinformatics.

[8]  Marie-France Sagot,et al.  Extracting structured motifs using a suffix tree—algorithms and application to promoter consensus identification , 2000, RECOMB '00.

[9]  L. Györfi,et al.  Nonparametric entropy estimation. An overview , 1997 .

[10]  Charles Elkan,et al.  Unsupervised learning of multiple motifs in biopolymers using expectation maximization , 1995, Mach. Learn..

[11]  Daisuke Kihara,et al.  EMD: an ensemble algorithm for discovering regulatory motifs in DNA sequences , 2006, BMC Bioinformatics.

[12]  Edith D. Wong,et al.  Saccharomyces Genome Database: the genomics resource of budding yeast , 2011, Nucleic Acids Res..

[13]  Mona Singh,et al.  M are better than one: an ensemble-based motif finder and its application to regulatory element prediction , 2009, Bioinform..

[14]  R. Tjian,et al.  Transcription regulation and animal diversity , 2003, Nature.

[15]  Benno Schwikowski,et al.  Algorithms for Phylogenetic Footprinting , 2002, J. Comput. Biol..

[16]  Diego di Bernardo,et al.  Inference of gene regulatory networks and compound mode of action from time course gene expression profiles , 2006, Bioinform..

[17]  Charles DeLisi,et al.  Positional clustering improves computational binding site detection and identifies novel cis -regulatory sites in mammalian GABA A receptor subunit genes. , 2007, Nucleic acids research.

[18]  Fangxue Sherry He,et al.  Systematic identification of mammalian regulatory motifs' target genes and functions , 2008, Nature Methods.

[19]  M S Waterman,et al.  Regulatory pattern identification in nucleic acid sequences. , 1983, Nucleic acids research.

[20]  Manolis Kellis,et al.  Reliable prediction of regulator targets using 12 Drosophila genomes. , 2007, Genome research.

[21]  Ke Wang,et al.  Profile-based string kernels for remote homology detection and motif extraction. , 2005, Journal of bioinformatics and computational biology.

[22]  Martin Tompa,et al.  An Exact Method for Finding Short Motifs in Sequences, with Application to the Ribosome Binding Site Problem , 1999, ISMB.

[23]  Eric C. Rouchka,et al.  Gibbs Recursive Sampler: finding transcription factor binding sites , 2003, Nucleic Acids Res..

[24]  Steven J. M. Jones,et al.  Locating mammalian transcription factor binding sites: a survey of computational and experimental techniques. , 2006, Genome research.

[25]  Bin Li,et al.  Limitations and potentials of current motif discovery algorithms , 2005, Nucleic acids research.

[26]  Ting Wang,et al.  Combining phylogenetic data with co-regulated genes to identify regulatory motifs , 2003, Bioinform..

[27]  A. Hartemink,et al.  An ensemble model of competitive multi-factor binding of the genome. , 2009, Genome research.

[28]  Kathleen Marchal,et al.  A higher-order background model improves the detection of promoter regulatory elements by Gibbs sampling , 2001, Bioinform..

[29]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[30]  Allen D. Delaney,et al.  Genome-wide profiles of STAT1 DNA association using chromatin immunoprecipitation and massively parallel sequencing , 2007, Nature Methods.

[31]  Gabriel Kreiman,et al.  Identification of sparsely distributed clusters of cis-regulatory elements in sets of co-expressed genes. , 2004, Nucleic acids research.

[32]  Saurabh Sinha,et al.  A Statistical Method for Finding Transcription Factor Binding Sites , 2000, ISMB.

[33]  Alexander J. Hartemink,et al.  A Nucleosome-Guided Map of Transcription Factor Binding Sites in Yeast , 2007, PLoS Comput. Biol..

[34]  William Stafford Noble,et al.  Assessing computational tools for the discovery of transcription factor binding sites , 2005, Nature Biotechnology.

[35]  Ronald W. Davis,et al.  Quantitative Monitoring of Gene Expression Patterns with a Complementary DNA Microarray , 1995, Science.

[36]  David Maxwell Chickering,et al.  Learning Bayesian Networks: The Combination of Knowledge and Statistical Data , 1994, Machine Learning.

[37]  C. Lawrence,et al.  Using the Gibbs Motif Sampler for phylogenetic footprinting. , 2007, Methods in molecular biology.

[38]  H. K. Dai,et al.  A survey of DNA motif finding algorithms , 2007, BMC Bioinformatics.

[39]  William Stafford Noble,et al.  Identification and analysis of functional elements in 1% of the human genome by the ENCODE pilot project , 2007, Nature.

[40]  Graziano Pesole,et al.  An algorithm for finding signals of unknown length in DNA sequences , 2001, ISMB.

[41]  Roded Sharan,et al.  A Discriminative Model for Identifying Spatial cis-Regulatory Modules , 2005, J. Comput. Biol..

[42]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[43]  G. Church,et al.  Computational identification of cis-regulatory elements associated with groups of functionally related genes in Saccharomyces cerevisiae. , 2000, Journal of molecular biology.

[44]  Michal Linial,et al.  Using Bayesian Networks to Analyze Expression Data , 2000, J. Comput. Biol..

[45]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[46]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[47]  Graziano Pesole,et al.  In silico representation and discovery of transcription factor binding sites , 2004, Briefings Bioinform..

[48]  Satoru Miyano,et al.  Algorithms for Identifying Boolean Networks and Related Biological Networks Based on Matrix Multiplication and Fingerprint Function , 2000, J. Comput. Biol..

[49]  Oliver J. Rando,et al.  Chromatin remodelling at promoters suppresses antisense transcription , 2007, Nature.

[50]  Zhi Wei,et al.  GAME: detecting cis-regulatory elements using a genetic algorithm , 2006, Bioinform..

[51]  Charles DeLisi,et al.  Binding Site Graphs: A New Graph Theoretical Framework for Prediction of Transcription Factor Binding Sites , 2007, PLoS Comput. Biol..

[52]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[53]  L. Hood,et al.  Regulatory gene networks and the properties of the developmental process , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[54]  Cheng-Yan Kao,et al.  A stochastic differential equation model for quantifying transcriptional regulatory network in Saccharomyces cerevisiae , 2005, Bioinform..

[55]  Jun S. Liu,et al.  De novo cis-regulatory module elicitation for eukaryotic genomes. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[56]  Irene K. Moore,et al.  A genomic code for nucleosome positioning , 2006, Nature.

[57]  Lee Aaron Newberg,et al.  A phylogenetic Gibbs sampler that yields centroid solutions for cis-regulatory site prediction , 2007, Bioinform..

[58]  Ting Wang,et al.  An improved map of conserved regulatory sites for Saccharomyces cerevisiae , 2006, BMC Bioinformatics.

[59]  Steven J. M. Jones,et al.  Dynamic Remodeling of Individual Nucleosomes Across a Eukaryotic Genome in Response to Transcriptional Perturbation , 2007, PLoS biology.

[60]  James L. Winkler,et al.  Accessing Genetic Information with High-Density DNA Arrays , 1996, Science.

[61]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[62]  W. Wong,et al.  CisModule: de novo discovery of cis-regulatory modules by hierarchical mixture modeling. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[64]  M. Tompa,et al.  Discovery of novel transcription factor binding sites by statistical overrepresentation. , 2002, Nucleic acids research.

[65]  A. Sandelin,et al.  Applied bioinformatics for the identification of regulatory elements , 2004, Nature Reviews Genetics.

[66]  Hiroki Arimura,et al.  On approximation algorithms for local multiple alignment , 2000, RECOMB '00.

[67]  Xuegong Zhang,et al.  Recursive SVM feature selection and sample classification for mass-spectrometry and microarray data , 2006, BMC Bioinformatics.

[68]  J. Hasty,et al.  Reverse engineering gene networks: Integrating genetic perturbations with dynamical modeling , 2003, Proceedings of the National Academy of Sciences of the United States of America.