Odintifier - A computational method for identifying insertions of organellar origin from modern and ancient high-throughput sequencing data based on haplotype phasing

BackgroundCellular organelles with genomes of their own (e.g. plastids and mitochondria) can pass genetic sequences to other organellar genomes within the cell in many species across the eukaryote phylogeny. The extent of the occurrence of these organellar-derived inserted sequences (odins) is still unknown, but if not accounted for in genomic and phylogenetic studies, they can be a source of error. However, if correctly identified, these inserted sequences can be used for evolutionary and comparative genomic studies. Although such insertions can be detected using various laboratory and bioinformatic strategies, there is currently no straightforward way to apply them as a standard organellar genome assembly on next-generation sequencing data. Furthermore, most current methods for identification of such insertions are unsuitable for use on non-model organisms or ancient DNA datasets.ResultsWe present a bioinformatic method that uses phasing algorithms to reconstruct both source and inserted organelle sequences. The method was tested in different shotgun and organellar-enriched DNA high-throughput sequencing (HTS) datasets from ancient and modern samples. Specifically, we used datasets from lions (Panthera leo ssp. and Panthera leo leo) to characterize insertions from mitochondrial origin, and from common grapevine (Vitis vinifera) and bugle (Ajuga reptans) to characterize insertions derived from plastid genomes. Comparison of the results against other available organelle genome assembly methods demonstrated that our new method provides an improvement in the sequence assembly.ConclusionUsing datasets from a wide range of species and different levels of complexity we showed that our novel bioinformatic method based on phasing algorithms can be used to achieve the next two goals: i) reference-guided assembly of chloroplast/mitochondrial genomes from HTS data and ii) identification and simultaneous assembly of odins. This method represents the first application of haplotype phasing for automatic detection of odins and reference-based organellar genome assembly.

[1]  C. Cruaud,et al.  Comparisons between mitochondrial genomes of domestic goat (Capra hircus) reveal the presence of numts and multiple sequencing errors , 2010, Mitochondrial DNA.

[2]  Jose V. Lopez,et al.  Rapid evolution of a heteroplasmic repetitive sequence in the mitochondrial DNA control region of carnivores , 1994, Journal of Molecular Evolution.

[3]  M. Nachman,et al.  Estimate of the mutation rate per nucleotide in humans. , 2000, Genetics.

[4]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[5]  N. Knowlton,et al.  Mitochondrial pseudogenes are pervasive and often insidious in the snapping shrimp genus Alpheus. , 2001, Molecular biology and evolution.

[6]  S. Gabriel,et al.  The Structure of Haplotype Blocks in the Human Genome , 2002, Science.

[7]  S. Lougheed,et al.  Defeating numts: semi-pure mitochondrial DNA from eggs and simple purification methods for field-collected wildlife tissues. , 2006, Genome.

[8]  P. Tam The International HapMap Consortium. The International HapMap Project (Co-PI of Hong Kong Centre which responsible for 2.5% of genome) , 2003 .

[9]  Cécile Fairhead,et al.  Mitochondrial DNA repairs double-strand breaks in yeast chromosomes , 1999, Nature.

[10]  S. Sawyer,et al.  Complete Mitochondrial Genomes of Ancient Canids Suggest a European Origin of Domestic Dogs , 2013, Science.

[11]  C. Wiuf,et al.  Assessing the Fidelity of Ancient DNA Sequences Amplified From Nuclear Genes , 2006, Genetics.

[12]  D. Petrov,et al.  Genomic gigantism: DNA loss is slow in mountain grasshoppers. , 2001, Molecular biology and evolution.

[13]  M. DePristo,et al.  The Genome Analysis Toolkit: a MapReduce framework for analyzing next-generation DNA sequencing data. , 2010, Genome research.

[14]  Toshihiro Tanaka The International HapMap Project , 2003, Nature.

[15]  Robert C. Edgar,et al.  MUSCLE: multiple sequence alignment with high accuracy and high throughput. , 2004, Nucleic acids research.

[16]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[17]  Frank Dudbridge,et al.  Haplotype tagging for the identification of common disease genes , 2001, Nature Genetics.

[18]  Qiaomei Fu,et al.  A mitochondrial genome sequence of a hominin from Sima de los Huesos , 2013, Nature.

[19]  B. Browning,et al.  Haplotype phasing: existing methods and new developments , 2011, Nature Reviews Genetics.

[20]  A. Quinlan BEDTools: The Swiss‐Army Tool for Genome Feature Analysis , 2014, Current protocols in bioinformatics.

[21]  D. Hartl,et al.  Mitochondrial pseudogenes: evolution's misplaced witnesses. , 2001, Trends in ecology & evolution.

[22]  S. Pääbo,et al.  A nuclear 'fossil' of the mitochondrial D-loop and the origin of modern humans , 1995, Nature.

[23]  T. Fennell,et al.  Analyzing and minimizing PCR amplification bias in Illumina sequencing libraries , 2011, Genome Biology.

[24]  K. Bi,et al.  Time and time again: unisexual salamanders (genus Ambystoma) are the oldest unisexual vertebrates , 2010, BMC Evolutionary Biology.

[25]  J. Leonard,et al.  Nuclear copies of mitochondrial genes: another problem for ancient DNA , 2010, Genetica.

[26]  B. J. Carey,et al.  Chromosome-wide distribution of haplotype blocks and the role of recombination hot spots , 2003, Nature Genetics.

[27]  M. DePristo,et al.  A framework for variation discovery and genotyping using next-generation DNA sequencing data , 2011, Nature Genetics.

[28]  Peter Dalgaard,et al.  R Development Core Team (2010): R: A language and environment for statistical computing , 2010 .

[29]  M. Ventura,et al.  Analysis of high-identity segmental duplications in the grapevine genome , 2011, BMC Genomics.

[30]  S. Kolokotronis,et al.  Detection of mitochondrial insertions in the nucleus (NuMts) of Pleistocene and modern muskoxen , 2007, BMC Evolutionary Biology.

[31]  D. Altshuler,et al.  A map of human genome variation from population-scale sequencing , 2010, Nature.

[32]  Yu-Wei Wu,et al.  Transfer of chloroplast genomic DNA to mitochondrial genome occurred at least 300 MYA. , 2007, Molecular biology and evolution.

[33]  Juliet M Chapman,et al.  Detecting Disease Associations due to Linkage Disequilibrium Using Haplotype Tags: A Class of Tests and the Determinants of Statistical Power , 2003, Human Heredity.

[34]  Kui Zhang,et al.  Joint haplotype phasing and genotype calling of multiple individuals using haplotype informative reads , 2013, Bioinform..

[35]  B. Emerson,et al.  Numts help to reconstruct the demographic history of the ocellated lizard (Lacerta lepida) in a secondary contact zone , 2012, Molecular ecology.

[36]  Andrew J. Alverson,et al.  Phylogenetic analyses of Vitis (Vitaceae) based on complete chloroplast genome sequences: effects of taxon sampling and phylogenetic methods on resolving relationships among rosids , 2006, BMC Evolutionary Biology.

[37]  F. Riley,et al.  HYBRIDIZATION BETWEEN THE NUCLEAR AND KINETOPLAST DNA'S OF Leishmania enriettii AND BETWEEN NUCLEAR AND MITOCHONDRIAL DNA'S OF MOUSE LIVER. , 1967, Proceedings of the National Academy of Sciences of the United States of America.

[38]  W. Thilly,et al.  Evolutionary trail of the mitochondrial genome as based on human 16S rDNA pseudogenes. , 1994, Gene.

[39]  P. Arctander,et al.  The Human Genome Project reveals a continuous transfer of large mitochondrial fragments to the nucleus. , 2001, Molecular biology and evolution.

[40]  C. Aquadro,et al.  Human mitochondrial DNA variation and evolution: analysis of nucleotide sequences from seven individuals. , 1983, Genetics.

[41]  J. Timmis,et al.  Environmental stress increases the entry of cytoplasmic organellar DNA into the nucleus in plants , 2012, Proceedings of the National Academy of Sciences.

[42]  D. Smith Extending the Limited Transfer Window Hypothesis to Inter-organelle DNA Migration , 2011, Genome biology and evolution.

[43]  Richard Durbin,et al.  Sequence analysis Fast and accurate short read alignment with Burrows – Wheeler transform , 2009 .

[44]  Sorin Istrail,et al.  HapCompass: A Fast Cycle Basis Algorithm for Accurate Haplotype Assembly of Sequence Data , 2012, J. Comput. Biol..

[45]  E. Haring,et al.  Unusual Origin of a Nuclear Pseudogene in the Italian Wall Lizard: Intergenomic and Interspecific Transfer of a Large Section of the Mitochondrial Genome in the Genus Podarcis (Lacertidae) , 2007, Journal of Molecular Evolution.

[46]  B. Browning,et al.  A unified approach to genotype imputation and haplotype-phase inference for large data sets of trios and unrelated individuals. , 2009, American journal of human genetics.

[47]  S. Pääbo,et al.  Unreliable mtDNA data due to nuclear insertions: a cautionary tale from analysis of humans and other great apes , 2004, Molecular ecology.

[48]  A. Pandit,et al.  Complete mitogenome of asiatic lion resolves phylogenetic status within Panthera , 2013, BMC Genomics.

[49]  Robert S. Harris,et al.  Improved pairwise alignment of genomic dna , 2007 .

[50]  A. Yoder,et al.  Using secondary structure to identify ribosomal numts: cautionary examples from the human genome. , 2002, Molecular biology and evolution.

[51]  B. Gjerde Characterisation of full-length mitochondrial copies and partial nuclear copies (numts) of the cytochrome b and cytochrome c oxidase subunit I genes of Toxoplasma gondii, Neospora caninum, Hammondia heydorni and Hammondia triffittae (Apicomplexa: Sarcocystidae) , 2013, Parasitology Research.

[52]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[53]  M. Jensen-Seaman,et al.  Identification of species-specific nuclear insertions of mitochondrial DNA (numts) in gorillas and their potential as population genetic markers. , 2014, Molecular phylogenetics and evolution.

[54]  C. Stewart,et al.  Insertions and duplications of mtDNA in the nuclear genomes of Old World monkeys and hominoids , 1995, Nature.

[55]  Dan Graur,et al.  A comparative analysis of numt evolution in human and chimpanzee. , 2007, Molecular biology and evolution.

[56]  V. Friesen,et al.  Sequence variation in the guillemot (Alcidae: Cepphus) mitochondrial control region and its nuclear homolog. , 1998, Molecular biology and evolution.

[57]  Qiaomei Fu,et al.  The complete mitochondrial DNA genome of an unknown hominin from southern Siberia , 2010, Nature.

[58]  Marcy R. Auerbach,et al.  A quick, direct method that can differentiate expressed mitochondrial genes from their nuclear pseudogenes , 1996, Current Biology.

[59]  Leo van Iersel,et al.  On the Complexity of Several Haplotyping Problems , 2005, WABI.

[60]  S. P. Fodor,et al.  Blocks of Limited Haplotype Diversity Revealed by High-Resolution Scanning of Human Chromosome 21 , 2001, Science.

[61]  M. Tiirola,et al.  Reliability of mitochondrial DNA in an acanthocephalan: the problem of pseudogenes. , 2006, International journal for parasitology.

[62]  A. Antunes,et al.  Discovery of a large number of previously unrecognized mitochondrial pseudogenes in fish genomes. , 2005, Genomics.

[63]  Helga Thorvaldsdóttir,et al.  Integrative Genomics Viewer (IGV): high-performance genomics data visualization and exploration , 2012, Briefings Bioinform..

[64]  Jeffrey P. Mower,et al.  Unprecedented heterogeneity in the synonymous substitution rate within a plant genome. , 2014, Molecular biology and evolution.

[65]  R. Viola,et al.  Mitochondrial DNA of Vitis vinifera and the issue of rampant horizontal gene transfer. , 2008, Molecular biology and evolution.

[66]  Steven J. M. Jones,et al.  Circos: an information aesthetic for comparative genomics. , 2009, Genome research.

[67]  Yun Sung Cho,et al.  The tiger genome and comparative analysis with lion and snow leopard genomes , 2013, Nature Communications.

[68]  F. Delsuc,et al.  Next-generation sequencing and phylogenetic signal of complete mitochondrial genomes for resolving the evolutionary history of leaf-nosed bats (Phyllostomidae). , 2013, Molecular phylogenetics and evolution.

[69]  E. Kejnovský,et al.  Analysis of plastid and mitochondrial DNA insertions in the nucleus (NUPTs and NUMTs) of six plant species: size, relative age and chromosomal localization , 2013, Heredity.

[70]  Jared C. Roach,et al.  Chromosomal haplotypes by genetic phasing of human families. , 2011, American journal of human genetics.

[71]  R. Hutterer,et al.  False phylogenies on wood mice due to cryptic cytochrome-b pseudogene. , 2009, Molecular phylogenetics and evolution.

[72]  J. V. López,et al.  Complete nucleotide sequences of the domestic cat (Felis catus) mitochondrial genome and a transposed mtDNA tandem repeat (Numt) in the nuclear genome. , 1996, Genomics.

[73]  John C. Marioni,et al.  Effect of read-mapping biases on detecting allele-specific expression from RNA-sequencing data , 2009, Bioinform..

[74]  Shuo Wang,et al.  Mitochondrial genome of the African lion Panthera leo leo , 2015, Mitochondrial DNA.

[75]  Qingwei Li,et al.  Comparative analysis of mitochondrial fragments transferred to the nucleus in vertebrate. , 2008, Journal of genetics and genomics = Yi chuan xue bao.

[76]  T. Lindahl Instability and decay of the primary structure of DNA , 1993, Nature.

[77]  D. Bensasson,et al.  Frequent assimilation of mitochondrial DNA by grasshopper nuclear genomes. , 2000, Molecular biology and evolution.

[78]  L L Cavalli-Sforza,et al.  The phylogeography of Y chromosome binary haplotypes and the origins of modern human populations , 2001, Annals of human genetics.

[79]  S. Behura Analysis of nuclear copies of mitochondrial sequences in honeybee (Apis mellifera) genome. , 2007, Molecular biology and evolution.

[80]  J. Blanchard,et al.  Mitochondrial DNA migration events in yeast and humans: integration by a common end-joining mechanism and alternative perspectives on nucleotide substitution patterns. , 1996, Molecular biology and evolution.

[81]  G. Gyapay,et al.  Location score and haplotype analyses of the locus for autosomal recessive spastic ataxia of Charlevoix-Saguenay, in chromosome region 13q11. , 1999, American journal of human genetics.

[82]  S. O’Brien,et al.  Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species. , 2006, Gene.

[83]  M. Hofreiter,et al.  Assessing ancient DNA studies. , 2005, Trends in ecology & evolution.

[84]  M. Nei,et al.  MEGA5: molecular evolutionary genetics analysis using maximum likelihood, evolutionary distance, and maximum parsimony methods. , 2011, Molecular biology and evolution.

[85]  J. Birchler,et al.  Recent and Frequent Insertions of Chloroplast DNA into Maize Nuclear Chromosomes , 2010, Cytogenetic and Genome Research.