MrBait: universal identification and design of targeted‐enrichment capture probes

Motivation: It is a non‐trivial task to identify and design capture probes (‘baits’) for the diverse array of targeted‐enrichment methods now available (e.g. ultra‐conserved elements, anchored hybrid enrichment, RAD‐capture). This often involves parsing large genomic alignments, followed by multiple steps of curating candidate genomic regions to optimize targeted information content (e.g. genetic variation) and to minimize potential probe dimerization and non‐target enrichment. Results: In this context, we developed MrBait, a user‐friendly, generalized software pipeline for identification, design and optimization of targeted‐enrichment probes across a range of target‐capture paradigms. MrBait is an open‐source codebase that leverages native parallelization capabilities in Python and mitigates memory usage via a relational‐database back‐end. Numerous filtering methods allow comprehensive optimization of designed probes, including built‐in functionality that employs BLAST, similarity‐based clustering and a graph‐based algorithm that ‘rescues’ failed probes. Availability and implementation: Complete code for MrBait is available on GitHub (https://github.com/tkchafin/mrbait), and is also available with all dependencies via one‐line installation using the conda package manager. Online documentation describing installation and runtime instructions can be found at: https://mrbait.readthedocs.io. Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  K. Katoh,et al.  MAFFT Multiple Sequence Alignment Software Version 7: Improvements in Performance and Usability , 2013, Molecular biology and evolution.

[2]  Nicholas G. Crawford,et al.  LSU Digital Commons LSU Digital Commons Ultraconserved elements are novel phylogenomic markers that Ultraconserved elements are novel phylogenomic markers that resolve placental mammal phylogeny when combined with resolve placental mammal phylogeny when combined with species-tree analysis species-tr , 2022 .

[3]  Bonnie B. Blaimer,et al.  The impact of GC bias on phylogenetic accuracy using targeted enrichment phylogenomic data. , 2017, Molecular phylogenetics and evolution.

[4]  Ben Nichols,et al.  VSEARCH: a versatile open source tool for metagenomics , 2016, PeerJ.

[5]  B. Danforth,et al.  On the universality of target‐enrichment baits for phylogenomic research , 2018 .

[6]  M. G. Campana BaitsTools: Software for hybridization capture bait design , 2018, Molecular ecology resources.

[7]  M. Blaxter,et al.  Genome-wide genetic marker discovery and genotyping using next-generation sequencing , 2011, Nature Reviews Genetics.

[8]  A. Lemmon,et al.  Anchored hybrid enrichment for massively high-throughput phylogenomics. , 2012, Systematic biology.

[9]  B. Faircloth Identifying conserved genomic elements and designing universal bait sets to enrich them , 2017 .

[10]  Emily H Turner,et al.  Target-enrichment strategies for next-generation sequencing , 2010, Nature Methods.

[12]  Travis C Glenn,et al.  RADcap: sequence capture of dual‐digest RADseq libraries with identifiable duplicates and reduced missing data , 2016, Molecular ecology resources.

[13]  Matthew G. Johnson,et al.  HybPiper: Extracting coding sequence and introns for phylogenetics from high-throughput sequencing reads using target enrichment1 , 2016, Applications in Plant Sciences.

[14]  Anandashankar Anil,et al.  HiCapTools: a software suite for probe design and proximity detection for targeted chromosome conformation capture applications , 2017, Bioinform..

[15]  N. Perna,et al.  progressiveMauve: Multiple Genome Alignment with Gene Gain, Loss and Rearrangement , 2010, PloS one.

[16]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[17]  B. Faircloth,et al.  Analysis of a Rapid Evolutionary Radiation Using Ultraconserved Elements: Evidence for a Bias in Some Multispecies Coalescent Methods. , 2016, Systematic biology.

[18]  Deren A. R. Eaton,et al.  PyRAD: assembly of de novo RADseq loci for phylogenetic analyses , 2013, bioRxiv.

[19]  J. Good,et al.  Transcriptome-based exon capture enables highly cost-effective comparative genomic data collection at moderate evolutionary scales , 2012, BMC Genomics.

[20]  J. Maguire,et al.  Solution Hybrid Selection with Ultra-long Oligonucleotides for Massively Parallel Targeted Sequencing , 2009, Nature Biotechnology.