DiscoMark: Nuclear marker discovery from orthologous sequences using draft genome data

High-throughput sequencing has laid the foundation for fast and cost-effective development of phylogenetic markers. Here we present the program DISCOMARK, which streamlines the development of nuclear DNA (nDNA) markers from whole-genome (or whole-transcriptome) sequencing data, combining local alignment, alignment trimming, reference mapping and primer design based on multiple sequence alignments in order to design primer pairs from input orthologous sequences. In order to demonstrate the suitability of DISCOMARK we designed markers for two groups of species, one consisting of closely related species and one group of distantly related species. For the closely related members of the species complex of Cloeon dipterum s.l. (Insecta, Ephemeroptera), the program discovered a total of 78 markers. Among these, we selected eight markers for amplification and Sanger sequencing. The exon sequence alignments (2,526 base pairs (bp)) were used to reconstruct a well supported phylogeny and to infer clearly structured haplotype networks. For the distantly related species we designed primers for several families in the insect order Ephemeroptera, using available genomic data from four sequenced species. We developed primer pairs for 23 markers that are designed to amplify across several families. The DISCOMARK program will enhance the development of new nDNA markersby providing a streamlined, automated approach to perform genome-scale scans for phylogenetic markers. The program is written in Python, released under a public license (GNU GPL v2), and together with a manual and example data set available at: https://github.com/hdetering/discomark.

[1]  R. DeSalle,et al.  Colonization and diversification of aquatic insects on three Macaronesian archipelagos using 59 nuclear loci derived from a draft genome. , 2017, Molecular phylogenetics and evolution.

[2]  Michael Matschiner,et al.  Fitchi: haplotype genealogy graphs based on the Fitch algorithm , 2016, Bioinform..

[3]  C. Robinson,et al.  Distribution and population genetic variation of cryptic species of the Alpine mayfly Baetis alpinus (Ephemeroptera: Baetidae) in the Central Alps , 2016, BMC Evolutionary Biology.

[4]  R. Tollrian,et al.  Multiple-stressor effects on stream invertebrates: DNA barcoding reveals contrasting responses of cryptic mayfly species , 2016 .

[5]  Hans Peter Herzig,et al.  Two-dimensional Polymer Grating and Prism on Bloch Surface Waves Platform , 2022 .

[6]  Courtney E. Lane,et al.  CEMAsuite: open source degenerate PCR primer design , 2015, Bioinform..

[7]  J. Pačes,et al.  Scrimer: designing primers from transcriptome data , 2015, Molecular ecology resources.

[8]  Damien M. O’Halloran,et al.  PrimerView: high-throughput primer design and visualization , 2015, Source Code for Biology and Medicine.

[9]  Max John,et al.  ReproPhylo: An Environment for Reproducible Phylogenomics , 2015, bioRxiv.

[10]  Hyejin Yoon,et al.  PrimerDesign-M: a multiple-alignment based multiple-primer design tool for walking across variable genomes , 2015, Bioinform..

[11]  Tal Pupko,et al.  GUIDANCE2: accurate detection of unreliable alignment regions accounting for the uncertainty of multiple parameters , 2015, Nucleic Acids Res..

[12]  Vivek Krishnakumar,et al.  MarkerMiner 1.0: A new application for phylogenetic marker development using angiosperm transcriptomes , 2015, Applications in plant sciences.

[13]  Mario Caccamo,et al.  PolyMarker: A fast polyploid primer design pipeline , 2015, Bioinform..

[14]  W. Salzburger,et al.  A tribal level phylogeny of Lake Tanganyika cichlid fishes based on a genomic multi-marker approach , 2015, Molecular phylogenetics and evolution.

[15]  M. Monaghan,et al.  Evolution and island endemism of morphologically cryptic Baetis and Cloeon species (Ephemeroptera, Baetidae) on the Canary Islands and Madeira , 2014 .

[16]  Hong Ma,et al.  Resolution of deep angiosperm phylogeny using conserved nuclear genes and estimates of early divergence times , 2014, Nature Communications.

[17]  Andrew E. Torda,et al.  Not assessing the efficiency of multiple sequence alignment programs , 2014, Algorithms for Molecular Biology.

[18]  Guilherme Oliveira,et al.  Assessing the efficiency of multiple sequence alignment programs , 2014, Algorithms for Molecular Biology.

[19]  F. Burbrink,et al.  Coalescent species delimitation in milksnakes (genus Lampropeltis) and impacts on phylogenetic comparative analyses. , 2014, Systematic biology.

[20]  Alexandros Stamatakis,et al.  RAxML version 8: a tool for phylogenetic analysis and post-analysis of large phylogenies , 2014, Bioinform..

[21]  A. Baker,et al.  One hundred new universal exonic markers for birds developed from a genomic pipeline , 2014, Journal of Ornithology.

[22]  M. Monaghan,et al.  Freshwater biodiversity and aquatic insect diversification. , 2014, Annual review of entomology.

[23]  Mohan Krishnamoorthy,et al.  A multiple-alignment based primer design algorithm for genetically highly variable DNA targets , 2013, BMC Bioinformatics.

[24]  M. Monaghan,et al.  Endemism and diversification in freshwater insects of Madagascar revealed by coalescent and phylogenetic analysis of museum and field collections. , 2013, Molecular phylogenetics and evolution.

[25]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[26]  Haitham Sobhy,et al.  Gemi: PCR Primers Prediction from Multiple Alignments , 2012, Comparative and functional genomics.

[27]  Travis C Glenn,et al.  Ultraconserved elements anchor thousands of genetic markers spanning multiple evolutionary timescales. , 2012, Systematic biology.

[28]  A. Lemmon,et al.  High-throughput identification of informative nuclear loci for shallow-scale phylogenetics and phylogeography. , 2012, Systematic biology.

[29]  Peter C. Wainwright,et al.  Resolution of ray-finned fish phylogeny and timing of diversification , 2012, Proceedings of the National Academy of Sciences.

[30]  Ramón Doallo,et al.  CircadiOmics: integrating circadian genomics, transcriptomics, proteomics and metabolomics , 2012, Nature Methods.

[31]  B. Faircloth,et al.  Primer3—new capabilities and interfaces , 2012, Nucleic acids research.

[32]  M. Villet,et al.  Cryptic variation in an ecological indicator organism: mitochondrial and nuclear DNA sequence data confirm distinct lineages of Baetis harrisoni Barnard (Ephemeroptera: Baetidae) in southern Africa , 2012, BMC Evolutionary Biology.

[33]  Maxim Teslenko,et al.  MrBayes 3.2: Efficient Bayesian Phylogenetic Inference and Model Choice Across a Large Model Space , 2012, Systematic biology.

[34]  Michel Sartori,et al.  Toward a DNA Taxonomy of Alpine Rhithrogena (Ephemeroptera: Heptageniidae) Using a Mixed Yule-Coalescent Analysis of Mitochondrial and Nuclear DNA , 2011, PloS one.

[35]  Robert J. Elshire,et al.  A Robust, Simple Genotyping-by-Sequencing (GBS) Approach for High Diversity Species , 2011, PloS one.

[36]  P. Sunnucks,et al.  Nuclear gene phylogeography using PHASE: dealing with unresolved genotypes, lost alleles, and systematic bias in parameter estimation , 2010, BMC Evolutionary Biology.

[37]  Jean‐François Flot seqphase: a web tool for interconverting phase input/output files and fasta sequence alignments , 2010, Molecular ecology resources.

[38]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[39]  Ingo Ebersberger,et al.  HaMStR: Profile hidden markov model based search for orthologs in ESTs , 2009, BMC Evolutionary Biology.

[40]  Toni Gabaldón,et al.  trimAl: a tool for automated alignment trimming in large-scale phylogenetic analyses , 2009, Bioinform..

[41]  M. Monaghan,et al.  Revision of Madeiran mayflies (Insecta, Ephemeroptera) , 2008 .

[42]  J. Gattolliat,et al.  What is Baetis rhodani (Pictet, 1843) (Insecta, Ephemeroptera, Baetidae)? Designation of a neotype and redescription of the species from its original area , 2008 .

[43]  R. Harrigan,et al.  Computation vs. cloning: evaluation of two methods for haplotype determination , 2008, Molecular ecology resources.

[44]  P. Etter,et al.  Rapid SNP Discovery and Genetic Mapping Using Sequenced RAD Markers , 2008, PloS one.

[45]  Yaqin Ma,et al.  BatchPrimer3: A high throughput web application for PCR and sequencing primer design , 2008, BMC Bioinformatics.

[46]  Dan Graur,et al.  Local Reliability Measures from Sets of Co-Optimal Multiple Sequence Alignments , 2007, Pacific Symposium on Biocomputing.

[47]  Gerard Talavera,et al.  Improvement of phylogenies after removing divergent and ambiguously aligned blocks from protein sequence alignments. , 2007, Systematic biology.

[48]  Jakob Fredslund,et al.  PriFi: using a multiple alignment of related sequences to find primers for amplification of homologs , 2005, Nucleic Acids Res..

[49]  Peter Donnelly,et al.  A comparison of bayesian methods for haplotype reconstruction from population genotype data. , 2003, American journal of human genetics.

[50]  O. Gascuel,et al.  A simple, fast, and accurate algorithm to estimate large phylogenies by maximum likelihood. , 2003, Systematic biology.

[51]  P. Donnelly,et al.  A new statistical method for haplotype reconstruction from population data. , 2001, American journal of human genetics.

[52]  Wei Qian,et al.  Selection of conserved blocks from multiple alignments for their use in phylogenetic analysis. , 2000, Molecular biology and evolution.

[53]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[54]  Cristobal Uauy,et al.  PolyMarker: A fast polyploid primer design , 2015 .

[55]  H. Ellegren Genome sequencing and population genomics in non-model organisms. , 2014, Trends in ecology & evolution.

[56]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[57]  Gapped BLAST and PSI-BLAST: A new , 1997 .