Genic regions of a large salamander genome contain long introns and novel genes

BackgroundThe basis of genome size variation remains an outstanding question because DNA sequence data are lacking for organisms with large genomes. Sixteen BAC clones from the Mexican axolotl (Ambystoma mexicanum: c-value = 32 × 109 bp) were isolated and sequenced to characterize the structure of genic regions.ResultsAnnotation of genes within BACs showed that axolotl introns are on average 10× longer than orthologous vertebrate introns and they are predicted to contain more functional elements, including miRNAs and snoRNAs. Loci were discovered within BACs for two novel EST transcripts that are differentially expressed during spinal cord regeneration and skin metamorphosis. Unexpectedly, a third novel gene was also discovered while manually annotating BACs. Analysis of human-axolotl protein-coding sequences suggests there are 2% more lineage specific genes in the axolotl genome than the human genome, but the great majority (86%) of genes between axolotl and human are predicted to be 1:1 orthologs. Considering that axolotl genes are on average 5× larger than human genes, the genic component of the salamander genome is estimated to be incredibly large, approximately 2.8 gigabases!ConclusionThis study shows that a large salamander genome has a correspondingly large genic component, primarily because genes have incredibly long introns. These intronic sequences may harbor novel coding and non-coding sequences that regulate biological processes that are unique to salamanders.

[1]  Tatiana A. Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[2]  F. Di Palma,et al.  Construction of bacterial artificial chromosome libraries for the Lake Malawi cichlid (Metriaclima zebra), and the blind cavefish (Astyanax mexicanus). , 2007, Zebrafish.

[3]  R. Carroll The Palaeozoic Ancestry of Salamanders, Frogs and Caecilians , 2007 .

[4]  D. Haussler,et al.  Ultraconserved Elements in the Human Genome , 2004, Science.

[5]  S. Bryant,et al.  Patterning in Limbs: The Resolution of Positional Confrontations , 1993 .

[6]  J. Häsler,et al.  Alu elements as regulators of gene expression , 2006, Nucleic acids research.

[7]  S. Voss,et al.  Early gene expression during natural spinal cord regeneration in the salamander Ambystoma mexicanum , 2006, Journal of neurochemistry.

[8]  D. Haussler,et al.  A distal enhancer and an ultraconserved exon are derived from a novel retroposon , 2006, Nature.

[9]  Byoung-Tak Zhang,et al.  ProMiR II: a web server for the probabilistic prediction of clustered, nonclustered, conserved and nonconserved microRNAs , 2006, Nucleic Acids Res..

[10]  W. J. Kent,et al.  BLAT--the BLAST-like alignment tool. , 2002, Genome research.

[11]  S. Bryant,et al.  Cell cycle length affects gene expression and pattern formation in limbs. , 1997, Developmental biology.

[12]  M. Long,et al.  Intron-exon structures of eukaryotic model organisms. , 1999, Nucleic acids research.

[13]  J. Brosius,et al.  From "junk" to gene: curriculum vitae of a primate receptor isoform gene. , 2004, Journal of molecular biology.

[14]  N. Shubin,et al.  Earliest known crown-group salamanders , 2003, Nature.

[15]  Leo Goodstadt,et al.  Phylogenetic Reconstruction of Orthology, Paralogy, and Conserved Synteny for Dog and Human , 2006, PLoS Comput. Biol..

[16]  A. Larson,et al.  DEVELOPMENTAL CORRELATES OF GENOME SIZE IN PLETHODONTID SALAMANDERS AND THEIR IMPLICATIONS FOR GENOME EVOLUTION , 1987, Evolution; international journal of organic evolution.

[17]  Srinivas Aluru,et al.  Efficient clustering of large EST data sets on parallel computers. , 2003, Nucleic acids research.

[18]  Peng Jiang,et al.  MiPred: classification of real and pseudo microRNA precursors using random forest prediction model with combined features , 2007, Nucleic Acids Res..

[19]  Liang-Hu Qu,et al.  snoSeeker: an advanced computational package for screening of guide and orphan snoRNA genes in the human genome , 2006, Nucleic acids research.

[20]  Zhiping Weng,et al.  Statistical analysis of the genomic distribution and correlation of regulatory elements in the ENCODE regions. , 2007, Genome research.

[21]  Christian E. V. Storm,et al.  Automatic clustering of orthologs and in-paralogs from pairwise species comparisons. , 2001, Journal of molecular biology.

[22]  Mark Gerstein,et al.  Systematic analysis of transcribed loci in ENCODE regions using RACE sequencing reveals extensive transcription in the human genome , 2008, Genome Biology.

[23]  T. Cavalier-smith,et al.  Nuclear volume control by nucleoskeletal DNA, selection for cell volume and cell growth rate, and the solution of the DNA C-value paradox. , 1978, Journal of cell science.

[24]  M. Lynch,et al.  The Origins of Genome Complexity , 2003, Science.

[25]  A. Vinogradov Intron–Genome Size Relationship on a Large Evolutionary Scale , 1999, Journal of Molecular Evolution.

[26]  A. Vinogradov "Genome design" model: evidence from conserved intronic sequence in human-mouse comparison. , 2006, Genome research.

[27]  P. B. Gates,et al.  Structure and expression of a newt cardio-skeletal myosin gene. Implications for the C value paradox. , 1988, Journal of molecular biology.

[28]  Araxi O. Urrutia,et al.  The signature of selection mediated by expression on human genes. , 2003, Genome research.

[29]  J. Wendel,et al.  Intron size and genome size in plants. , 2002, Molecular biology and evolution.

[30]  Paulo P. Amaral,et al.  Noncoding RNA in development , 2008, Mammalian Genome.

[31]  C. Allis,et al.  Translating the Histone Code , 2001, Science.

[32]  N. Straus Comparative DNA renaturation kinetics in amphibians. , 1971, Proceedings of the National Academy of Sciences of the United States of America.

[33]  R. Hardison Conserved noncoding sequences are reliable guides to regulatory elements. , 2000, Trends in genetics : TIG.

[34]  Tatiana Tatusova,et al.  NCBI Reference Sequence (RefSeq): a curated non-redundant sequence database of genomes, transcripts and proteins , 2004, Nucleic Acids Res..

[35]  S. Voss,et al.  Sal-Site: Integrating new and existing ambystomatid salamander research and informational resources , 2005, BMC Genomics.

[36]  Klaudia Walter,et al.  Highly Conserved Non-Coding Sequences Are Associated with Vertebrate Development , 2004, PLoS biology.

[37]  Bianca Habermann,et al.  From biomedicine to natural history research: EST resources for ambystomatid salamanders , 2004, BMC Genomics.

[38]  John M. Hancock,et al.  ORF Finder (Open Reading Frame Finder) , 2004 .

[39]  R. Poulter,et al.  A retrotransposon family from the pufferfish (fugu) Fugu rubripes. , 1998, Gene.

[40]  X. Huang,et al.  CAP3: A DNA sequence assembly program. , 1999, Genome research.

[41]  A. Morescalchi,et al.  DNA renaturation kinetics in some paedogenetic urodeles , 1974, Experientia.

[42]  Klaudia Walter,et al.  Parallel evolution of conserved non-coding elements that target a common set of developmental regulatory genes from worms to humans , 2007, Genome Biology.

[43]  A. Morscalchi,et al.  Sirenids: a family of polyploid urodeles? , 1974, Experientia.

[44]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[45]  M. Kozak,et al.  Regulation of translation via mRNA structure in prokaryotes and eukaryotes. , 2005, Gene.

[46]  J. Mattick,et al.  The relationship between non-protein-coding DNA and eukaryotic complexity. , 2007, BioEssays : news and reviews in molecular, cellular and developmental biology.

[47]  A. Vinogradov Evolution of genome size: multilevel selection, mutation bias or dynamical chaos? , 2004, Current opinion in genetics & development.

[48]  A. Paterson,et al.  Incongruent patterns of local and global genome size evolution in cotton. , 2004, Genome research.

[49]  P. Stadler,et al.  RNA Maps Reveal New RNA Classes and a Possible Function for Pervasive Transcription , 2007, Science.

[50]  C. A. Thomas The genetic organization of chromosomes. , 1971, Annual review of genetics.

[51]  J. Mattick,et al.  Rapid evolution of noncoding RNAs: lack of conservation does not mean lack of function. , 2006, Trends in genetics : TIG.

[52]  J. Jurka,et al.  Repbase Update, a database of eukaryotic repetitive elements , 2005, Cytogenetic and Genome Research.

[53]  Ian A. Swinburne,et al.  Intron delays and transcriptional timing during development. , 2008, Developmental cell.

[54]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[55]  D. Mccormick Sequence the Human Genome , 1986, Bio/Technology.

[56]  Nancy F. Hansen,et al.  Comparative analyses of multi-species sequences from targeted genomic regions , 2003, Nature.

[57]  J. Mattick RNA regulation: a new genetics? , 2004, Nature Reviews Genetics.

[58]  T. Gingeras,et al.  Genome-wide transcription and the implications for genomic organization , 2007, Nature Reviews Genetics.

[59]  I. K. Jordan,et al.  Origin and Evolution of Human microRNAs From Transposable Elements , 2007, Genetics.

[60]  G. Wray The evolutionary significance of cis-regulatory mutations , 2007, Nature Reviews Genetics.

[61]  Cristian I. Castillo-Davis,et al.  Selection for short introns in highly expressed genes , 2002, Nature Genetics.

[62]  S. Voss,et al.  Microarray analysis identifies keratin loci as sensitive biomarkers for thyroid hormone disruption in the salamander Ambystoma mexicanum. , 2007, Comparative biochemistry and physiology. Toxicology & pharmacology : CBP.

[63]  P. Fernández-Salguero,et al.  Genome-wide B1 retrotransposon binds the transcription factors dioxin receptor and Slug and regulates gene expression in vivo , 2008, Proceedings of the National Academy of Sciences.

[64]  T. Cavalier-smith The Evolution of genome size , 1985 .