NUMT PARSER: automated identification and removal of nuclear mitochondrial pseudogenes (numts) for accurate mitochondrial genome reconstruction in Panthera

Nuclear mitochondrial pseudogenes (numts) may hinder the reconstruction of mtDNA genomes and affect the reliability of mtDNA datasets for phylogenetic and population genetic comparisons. Here, we present the program Numt Parser, which allows for the identification of DNA sequences that likely originate from numt pseudogene DNA. Sequencing reads are classified as originating from either numt or true cytoplasmic mitochondrial (cymt) DNA by direct comparison against cymt and numt reference sequences. Classified reads can then be parsed into cymt or numt datasets. We tested this program using whole genome shotgun-sequenced data from two ancient Cape lions (Panthera leo), because mtDNA is often the marker of choice for ancient DNA studies and the genus Panthera is known to have numt pseudogenes. Numt Parser decreased sequence disagreements that were likely due to numt pseudogene contamination and equalized read coverage across the mitogenome by removing reads that likely originated from numts. We compared the efficacy of Numt Parser to two other bioinformatic approaches that can be used to account for numt contamination. We found that Numt Parser outperformed approaches that rely only on read alignment or Basic Local Alignment Search Tool (BLAST) properties, and was effective at identifying sequences that likely originated from numts while having minimal impacts on the recovery of cymt reads. Numt Parser therefore improves the reconstruction of true mitogenomes, allowing for more accurate and robust biological inferences.

[1]  Thomas M. Keane,et al.  Twelve years of SAMtools and BCFtools , 2020, GigaScience.

[2]  W. Murphy,et al.  Spatiotemporal Genetic Diversity of Lions Reveals the Influence of Habitat Fragmentation across Africa , 2020, Molecular biology and evolution.

[3]  Danny E. Miller,et al.  Long live the king: chromosome-level assembly of the lion (Panthera leo) using linked-read, Hi-C, and long-read data , 2019, BMC Biology.

[4]  August E. Woerner,et al.  A novel phylogenetic approach for de novo discovery of putative nuclear mitochondrial (pNumt) haplotypes. , 2019, Forensic science international. Genetics.

[5]  J. Derr,et al.  Genetic analysis of African lions (Panthera leo) in Zambia support movement across anthropogenic and geographical barriers , 2019, PloS one.

[6]  Maxime Merheb,et al.  Mitochondrial DNA, a Powerful Tool to Decipher Ancient Human Civilization from Domestication to Music, and to Uncover Historical Murder Cases , 2019, Cells.

[7]  Flavia C D Andrade,et al.  Historical Expansion of Kyasanur Forest Disease in India From 1957 to 2017: A Retrospective Analysis , 2019, GeoHealth.

[8]  Jia Gu,et al.  fastp: an ultra-fast all-in-one FASTQ preprocessor , 2018, bioRxiv.

[9]  Jin-Wu Nam,et al.  The present and future of de novo whole-genome assembly , 2016, Briefings Bioinform..

[10]  R. Vossen,et al.  Full-Length Mitochondrial-DNA Sequencing on the PacBio RSII. , 2017, Methods in molecular biology.

[11]  A. Janke,et al.  Screening for the ancient polar bear mitochondrial genome reveals low integration of mitochondrial pseudogenes (numts) in bears , 2016, bioRxiv.

[12]  P. de Knijff,et al.  Phylogeographic Patterns in Africa and High Resolution Delineation of Genetic Clades in the Lion (Panthera leo) , 2016, Scientific Reports.

[13]  W. Murphy,et al.  Phylogenomic evidence for ancient hybridization in the genomes of living cats (Felidae) , 2016, Genome research.

[14]  P. Garber,et al.  Full‐length Numt analysis provides evidence for hybridization between the Asian colobine genera Trachypithecus and Semnopithecus , 2015, American journal of primatology.

[15]  David M. Lambert,et al.  Impacts of low coverage depths and post-mortem DNA damage on variant calling: a simulation study , 2015, BMC Genomics.

[16]  P. de Knijff,et al.  Developmental validation of mitochondrial DNA genotyping assays for adept matrilineal inference of biogeographic ancestry at a continental level. , 2014, Forensic science international. Genetics.

[17]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[18]  Philip L. F. Johnson,et al.  mapDamage2.0: fast approximate Bayesian estimates of ancient DNA damage parameters , 2013, Bioinform..

[19]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[20]  B. Shapiro,et al.  Ancient DNA , 2020, Definitions.

[21]  C. Douady,et al.  Preventing the pollution of mitochondrial datasets with nuclear mitochondrial paralogs (numts). , 2011, Mitochondrion.

[22]  J. Leonard,et al.  Nuclear copies of mitochondrial genes: another problem for ancient DNA , 2010, Genetica.

[23]  A. Gaziev,et al.  Nuclear mitochondrial pseudogenes , 2010, Molecular Biology.

[24]  W. Martin,et al.  Molecular Poltergeists: Mitochondrial DNA Copies (numts) in Sequenced Nuclear Genomes , 2010, PLoS genetics.

[25]  Aaron R. Quinlan,et al.  BIOINFORMATICS APPLICATIONS NOTE , 2022 .

[26]  Richard Durbin,et al.  Fast and accurate long-read alignment with Burrows–Wheeler transform , 2010, Bioinform..

[27]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[28]  Gonçalo R. Abecasis,et al.  The Sequence Alignment/Map format and SAMtools , 2009, Bioinform..

[29]  J. DeWoody,et al.  Integrating numt pseudogenes into mitochondrial phylogenies: comment on ‘Mitochondrial phylogeny of Arvicolinae using comprehensive taxonomic sampling yields new insights’ , 2009 .

[30]  K. Crandall,et al.  Many species in one: DNA barcoding overestimates the number of species when nuclear mitochondrial pseudogenes are coamplified , 2008, Proceedings of the National Academy of Sciences.

[31]  B. Hänfling,et al.  Mitochondrial phylogeny of Arvicolinae using comprehensive taxonomic sampling yields new insights , 2008 .

[32]  A. Amorim,et al.  Specificity of mtDNA-directed PCR—influence of NUclear MTDNA insertion (NUMT) contamination in routine samples and techniques , 2008, International Journal of Legal Medicine.

[33]  S. O’Brien,et al.  Evolutionary analysis of a large mtDNA translocation (numt) into the nuclear genome of the Panthera genus species. , 2006, Gene.

[34]  E. Haring,et al.  Unusual Origin of a Nuclear Pseudogene in the Italian Wall Lizard: Intergenomic and Interspecific Transfer of a Large Section of the Mitochondrial Genome in the Genus Podarcis (Lacertidae) , 2007, Journal of Molecular Evolution.

[35]  A. Antunes,et al.  Discovery of a large number of previously unrecognized mitochondrial pseudogenes in fish genomes. , 2005, Genomics.

[36]  M. Hofreiter,et al.  Assessing ancient DNA studies. , 2005, Trends in ecology & evolution.

[37]  A. Baker,et al.  Low number of mitochondrial pseudogenes in the chicken (Gallus gallus) nuclear genome: implications for molecular inference of population history and phylogenetics , 2004, BMC Evolutionary Biology.

[38]  J. Blanchard,et al.  Pervasive migration of organellar DNA to the nucleus in plants , 1995, Journal of Molecular Evolution.

[39]  C. Kuiken,et al.  Nuclear counterparts of the cytoplasmic mitochondrial 12S rRNA gene: A problem of ancient DNA and molecular phylogenies , 1995, Journal of Molecular Evolution.

[40]  Jose V. Lopez,et al.  Numt, a recent transfer and tandem amplification of mitochondrial DNA to the nuclear genome of the domestic cat , 1994, Journal of Molecular Evolution.

[41]  A. von Haeseler,et al.  DNA sequences from multiple amplifications reveal artifacts induced by cytosine deamination in ancient DNA. , 2001, Nucleic acids research.

[42]  E. Lowy,et al.  The Old World Sparrows (Genus Passer) Phylogeography and Their Relative Abundance of Nuclear mtDNA Pseudogenes , 2001, Journal of Molecular Evolution.

[43]  D. Hartl,et al.  Mitochondrial pseudogenes: evolution's misplaced witnesses. , 2001, Trends in ecology & evolution.

[44]  H. Poinar,et al.  Ancient DNA: Do It Right or Not at All , 2000, Science.

[45]  H. Zischler Nuclear integrations of mitochondrial DNA in primates: Inference of associated mutational events , 2000, Electrophoresis.

[46]  H. E. Vaughan,et al.  The localization of mitochondrial sequences to chromosomal DNA in orthopterans , 1999 .

[47]  D. Agro,et al.  Speciation in North American Chickadees : II. Geography of mtDNA haplotypes in Poecile carolinensis , 1999 .

[48]  P. Bree Short notes and reviews On a mounted skeleton of apparently the extinct Cape Lion, Panthera leo melanochaita (Ch. H. Smith, 1842) , 1998 .

[49]  M. Sorenson,et al.  Numts : A challenge for avian systematics and population biology , 1998 .

[50]  P. Bree On a mounted skeleton of apparently the extinct Cape Lion, Panthera leo melanochaita , 1998 .

[51]  D. Murdock,et al.  Ancient mtDNA sequences in the human nuclear genome: a potential source of errors in identifying pathogenic mutations. , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[52]  D. -. Zhang,et al.  Nuclear integrations: challenges for mitochondrial DNA markers. , 1996, Trends in ecology & evolution.

[53]  J. Blanchard,et al.  Mitochondrial DNA migration events in yeast and humans: integration by a common end-joining mechanism and alternative perspectives on nucleotide substitution patterns. , 1996, Molecular biology and evolution.

[54]  S. Pääbo,et al.  A nuclear 'fossil' of the mitochondrial D-loop and the origin of modern humans , 1995, Nature.

[55]  C. Stewart,et al.  Insertions and duplications of mtDNA in the nuclear genomes of Old World monkeys and hominoids , 1995, Nature.

[56]  S. Hedges,et al.  Detecting dinosaur DNA. , 1995, Science.

[57]  S. Woodward,et al.  DNA sequence from Cretaceous period bone fragments. , 1994, Science.

[58]  J. R. Stauffer,et al.  Intra- and interspecific mitochondrial DNA sequence variation within two species of rock-dwelling cichlids (Teleostei: Cichlidae) from Lake Malawi, Africa. , 1994, Molecular phylogenetics and evolution.

[59]  Alison M. Mostrom,et al.  SPECIATION IN NORTH AMERICAN CHICKADEES: I. PATTERNS OF mtDNA GENETIC DIVERGENCE , 1993, Evolution; international journal of organic evolution.

[60]  S. Amstrup,et al.  Interspecific and intraspecific mitochondrial DNA variation in North American bears (Ursus) , 1991 .