Analysis of multiple genomic sequence alignments: a web resource, online tools, and lessons learned from analysis of mammalian SCL loci.

Comparative analysis of genomic sequences is becoming a standard technique for studying gene regulation. However, only a limited number of tools are currently available for the analysis of multiple genomic sequences. An extensive data set for the testing and training of such tools is provided by the SCL gene locus. Here we have expanded the data set to eight vertebrate species by sequencing the dog SCL locus and by annotating the dog and rat SCL loci. To provide a resource for the bioinformatics community, all SCL sequences and functional annotations, comprising a collation of the extensive experimental evidence pertaining to SCL regulation, have been made available via a Web server. A Web interface to new tools specifically designed for the display and analysis of multiple sequence alignments was also implemented. The unique SCL data set and new sequence comparison tools allowed us to perform a rigorous examination of the true benefits of multiple sequence comparisons. We demonstrate that multiple sequence alignments are, overall, superior to pairwise alignments for identification of mammalian regulatory regions. In the search for individual transcription factor binding sites, multiple alignments markedly increase the signal-to-noise ratio compared to pairwise alignments.

[1]  J. V. Moran,et al.  Initial sequencing and analysis of the human genome. , 2001, Nature.

[2]  B. Göttgens,et al.  Chromatin structure and transcriptional regulation of the stem cell leukaemia (SCL) gene in mast cells , 1999, Leukemia.

[3]  W. Miller,et al.  Identification of a coordinate regulator of interleukins 4, 13, and 5 by cross-species sequence comparisons. , 2000, Science.

[4]  Michael Brudno,et al.  Fast and sensitive multiple alignment of large genomic sequences , 2003, BMC Bioinformatics.

[5]  B. Göttgens,et al.  Establishing the transcriptional programme for blood: the SCL stem cell enhancer is regulated by a multiprotein complex containing Ets and GATA factors , 2002, The EMBO journal.

[6]  Jon D. McAuliffe,et al.  Phylogenetic Shadowing of Primate Sequences to Find Functional Regions of the Human Genome , 2003, Science.

[7]  F. Collins,et al.  A vision for the future of genomics research , 2003, Nature.

[8]  Gregory W. Warr,et al.  An IgH Enhancer That Drives Transcription through Basic Helix-Loop-Helix and Oct Transcription Factor Binding Motifs , 2001, The Journal of Biological Chemistry.

[9]  L. Hinton,et al.  Construction and characterization of an eightfold redundant dog genomic bacterial artificial chromosome library. , 1999, Genomics.

[10]  Berthold Göttgens,et al.  Comparative and functional analyses of LYL1 loci establish marsupial sequences as a model for phylogenetic footprinting. , 2003, Genomics.

[11]  W. Miller,et al.  Distinguishing regulatory DNA from neutral sites. , 2003, Genome research.

[12]  A. Green,et al.  The SCL gene: from case report to critical hematopoietic regulator. , 1999, Blood.

[13]  D R Bentley,et al.  Long-range comparison of human and mouse SCL loci: localized regions of sensitivity to restriction endonucleases correspond precisely with peaks of conserved noncoding sequences. , 2001, Genome research.

[14]  B. Göttgens,et al.  Distinct 5' SCL enhancers direct transcription to developing brain, spinal cord, and endothelium: neural expression is mediated by GATA factor binding sites. , 1999, Developmental biology.

[15]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[16]  D. Church,et al.  Cross-species sequence comparisons: a review of methods and available resources. , 2003, Genome research.

[17]  B. Göttgens,et al.  An SCL 3' enhancer targets developing endothelium together with embryonic and adult haematopoietic progenitors. , 1999, Development.

[18]  C. Lawrence,et al.  Human-mouse genome comparisons to locate regulatory sites , 2000, Nature Genetics.

[19]  Berthold Göttgens,et al.  Transcriptional regulation of the stem cell leukemia gene (SCL)--comparative analysis of five vertebrate SCL loci. , 2002, Genome research.

[20]  Mei Li,et al.  MultiPipMaker and supporting tools: alignments and analysis of multiple genomic DNA sequences , 2003, Nucleic Acids Res..

[21]  M. Goodman,et al.  Phylogenetic footprinting reveals unexpected complexity in trans factor binding upstream from the epsilon-globin gene. , 1993, Proceedings of the National Academy of Sciences of the United States of America.

[22]  Berthold Göttgens,et al.  Analysis of vertebrate SCL loci identifies conserved enhancers , 2000, Nature Biotechnology.

[23]  Berthold Göttgens,et al.  Regulation of the stem cell leukemia (SCL) gene: A tale of two fishes , 2001, Proceedings of the National Academy of Sciences of the United States of America.

[24]  Chuong B. Do,et al.  Access the most recent version at doi: 10.1101/gr.926603 References , 2003 .

[25]  W. Miller,et al.  Comparison of five methods for finding conserved sequences in multiple alignments of gene regulatory regions. , 1999, Nucleic acids research.

[26]  W Miller,et al.  Phylogenetic footprinting of hypersensitive site 3 of the beta-globin locus control region. , 1997, Blood.

[27]  Webb Miller,et al.  Comparative genome analysis delimits a chromosomal domain and identifies key regulatory elements in the α globin cluster , 2001 .

[28]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[29]  R. Gibbs,et al.  PipMaker--a web server for aligning two genomic DNA sequences. , 2000, Genome research.

[30]  S. Brenner,et al.  Fugu and human sequence comparison identifies novel human genes and conserved non-coding sequences. , 2002, Gene.

[31]  Webb Miller,et al.  Comparison of genomic DNA sequences: solved and unsolved problems , 2001, Bioinform..

[32]  M. Goodman,et al.  Embryonic ε and γ globin genes of a prosimian primate (Galago crassicaudatus): Nucleotide and amino acid sequences, developmental regulation and phylogenetic footprints , 1988 .

[33]  B. Göttgens,et al.  Rescue of the lethal scl(-/-) phenotype by the human SCL locus. , 2002, Blood.

[34]  A. Green,et al.  Selective rescue of early haematopoietic progenitors in Scl(-/-) mice by expressing Scl under the control of a stem cell enhancer. , 2001, Development.

[35]  G. Elgar,et al.  Complete sequencing of the Fugu WAGR region from WT1 to PAX6: dramatic compaction and conservation of synteny with human chromosome 11p13. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[36]  S. Batzoglou,et al.  Quantitative estimates of sequence divergence for comparative analyses of mammalian genomes. , 2003, Genome research.