Eukaryotic regulatory element conservation analysis and identification using comparative genomics.

Comparative genomics is a promising approach to the challenging problem of eukaryotic regulatory element identification, because functional noncoding sequences may be conserved across species from evolutionary constraints. We systematically analyzed known human and Saccharomyces cerevisiae regulatory elements and discovered that human regulatory elements are more conserved between human and mouse than are background sequences. Although S. cerevisiae regulatory elements do not appear to be more conserved by comparison of S. cerevisiae to Schizosaccharomyces pombe, they are more conserved when compared with multiple other yeast genomes (Saccharomyces paradoxus, Saccharomyces mikatae, and Saccharomyces bayanus). Based on these analyses, we developed a sequence-motif-finding algorithm called CompareProspector, which extends Gibbs sampling by biasing the search in regions conserved across species. Using human-mouse comparison, CompareProspector identified known motifs for transcription factors Mef2, Myf, Srf, and Sp1 from a set of human-muscle-specific genes. It also discovered the NFAT motif from genes up-regulated by CD28 stimulation in T-cells, which implies the direct involvement of NFAT in mediating the CD28 stimulatory signal. Using Caenorhabditis elegans-Caenorhabditis briggsae comparison, CompareProspector found the PHA-4 motif and the UNC-86 motif. CompareProspector outperformed many other computational motif-finding programs, demonstrating the power of comparative genomics-based biased sampling in eukaryotic regulatory element identification.

[1]  M. Blanchette,et al.  Discovery of regulatory elements by a computational method for phylogenetic footprinting. , 2002, Genome research.

[2]  B. Barrell,et al.  The genome sequence of Schizosaccharomyces pombe , 2002, Nature.

[3]  Douglas L. Brutlag,et al.  BioProspector: Discovering Conserved DNA Motifs in Upstream Regulatory Regions of Co-Expressed Genes , 2000, Pacific Symposium on Biocomputing.

[4]  W. Miller,et al.  Distinguishing regulatory DNA from neutral sites. , 2003, Genome research.

[5]  D. Haussler,et al.  Human-mouse alignments with BLASTZ. , 2003, Genome research.

[6]  Gary D. Stormo,et al.  Identification of consensus patterns in unaligned DNA sequences known to be functionally related , 1990, Comput. Appl. Biosci..

[7]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[8]  M. Chalfie,et al.  Regulation of touch receptor differentiation by the Caenorhabditis elegans mec-3 and unc-86 genes. , 1998, Development.

[9]  G. Ruvkun,et al.  Regulation of the mec‐3 gene by the C.elegans homeoproteins UNC‐86 and MEC‐3. , 1992, The EMBO journal.

[10]  W. J. Kent,et al.  Conservation, regulation, synteny, and introns in a large-scale C. briggsae-C. elegans genomic alignment. , 2000, Genome research.

[11]  P. Hogan,et al.  Transcription factors of the NFAT family: regulation and function. , 1997, Annual review of immunology.

[12]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[13]  R. Costa,et al.  The DNA-binding specificity of the hepatocyte nuclear factor 3/forkhead domain is influenced by amino-acid residues adjacent to the recognition helix , 1994, Molecular and cellular biology.

[14]  Jun S. Liu,et al.  Bayesian Models for Multiple Local Sequence Alignment and Gibbs Sampling Strategies , 1995 .

[15]  S. Mango,et al.  Regulation of Organogenesis by the Caenorhabditis elegans FoxA Protein PHA-4 , 2002, Science.

[16]  G. Church,et al.  Conservation of DNA regulatory motifs and discovery of new motifs in microbial genomes. , 2000, Genome research.

[17]  B. Birren,et al.  Sequencing and comparison of yeast species to identify genes and regulatory elements , 2003, Nature.

[18]  Jean Thierry-Mieg,et al.  A global analysis of Caenorhabditis elegans operons , 2002, Nature.

[19]  Charles Elkan,et al.  Fitting a Mixture Model By Expectation Maximization To Discover Motifs In Biopolymer , 1994, ISMB.

[20]  L. Fulton,et al.  Finding Functional Features in Saccharomyces Genomes by Phylogenetic Footprinting , 2003, Science.

[21]  Michael Q. Zhang,et al.  SCPD: a promoter database of the yeast Saccharomyces cerevisiae , 1999, Bioinform..

[22]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[23]  G. Church,et al.  Finding DNA regulatory motifs within unaligned noncoding sequences clustered by whole-genome mRNA quantitation , 1998, Nature Biotechnology.

[24]  Xin Chen,et al.  TRANSFAC: an integrated system for gene expression regulation , 2000, Nucleic Acids Res..

[25]  J. Liu,et al.  Phylogenetic footprinting of transcription factor binding sites in proteobacterial genomes. , 2001, Nucleic acids research.

[26]  Donna R. Maglott,et al.  RefSeq and LocusLink: NCBI gene-centered resources , 2001, Nucleic Acids Res..

[27]  L. Pennacchio,et al.  Genomic strategies to identify mammalian regulatory sequences , 2001, Nature Reviews Genetics.

[28]  Lee Ann McCue,et al.  Identification of co-regulated genes through Bayesian clustering of predicted regulatory binding sites , 2003, Nature Biotechnology.

[29]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[30]  Mouse Genome Sequencing Consortium Initial sequencing and comparative analysis of the mouse genome , 2002, Nature.

[31]  Ash A. Alizadeh,et al.  Genomic expression programs and the integration of the CD28 costimulatory signal in T cell activation , 2002, Proceedings of the National Academy of Sciences of the United States of America.