SKESA: strategic k-mer extension for scrupulous assemblies

SKESA is a DeBruijn graph-based de-novo assembler designed for assembling reads of microbial genomes sequenced using Illumina. Comparison with SPAdes and MegaHit shows that SKESA produces assemblies that have high sequence quality and contiguity, handles low-level contamination in reads, is fast, and produces an identical assembly for the same input when assembled multiple times with the same or different compute resources. SKESA has been used for assembling over 272,000 read sets in the Sequence Read Archive at NCBI and for real-time pathogen detection. Source code for SKESA is freely available at https://github.com/ncbi/SKESA/releases.

[1]  E. Birney,et al.  Velvet: algorithms for de novo short read assembly using de Bruijn graphs. , 2008, Genome research.

[2]  Peter Sanders,et al.  Cache-, hash-, and space-efficient bloom filters , 2009, JEAL.

[3]  Yi Chen,et al.  Implementation of Nationwide Real-time Whole-genome Sequencing to Enhance Listeriosis Outbreak Detection and Investigation. , 2016, Clinical infectious diseases : an official publication of the Infectious Diseases Society of America.

[4]  Afiahayati,et al.  MetaVelvet-SL: an extension of the Velvet assembler to a de novo metagenomic assembler utilizing supervised learning , 2014, DNA research : an international journal for rapid publication of reports on genes and genomes.

[5]  S. Conlan,et al.  Surveillance of Carbapenem-Resistant Klebsiella pneumoniae: Tracking Molecular Epidemiology and Outcomes through a Regional Network , 2014, Antimicrobial Agents and Chemotherapy.

[6]  Lior Pachter,et al.  RESEARCH ARTICLE Open Access Identification and correction of systematic error in high-throughput sequence data , 2022 .

[7]  Pavel A. Pevzner,et al.  dipSPAdes: Assembler for Highly Polymorphic Diploid Genomes , 2014, RECOMB.

[8]  Daniel D. Sommer,et al.  MetAMOS: a modular and open source metagenomic assembly and analysis pipeline , 2013, Genome Biology.

[9]  Padmini Ramachandran,et al.  Genomics of foodborne pathogens for microbial food safety. , 2018, Current opinion in biotechnology.

[10]  Bahlul Haider,et al.  Omega: an Overlap-graph de novo Assembler for Metagenomics , 2014, Bioinform..

[11]  Arne Holst-Jensen,et al.  High Throughput Sequencing for Detection of Foodborne Pathogens , 2017, Front. Microbiol..

[12]  Huaiqiu Zhu,et al.  InteMAP: Integrated metagenomic assembly pipeline for NGS short reads , 2015, BMC Bioinformatics.

[13]  Alexey A. Gurevich,et al.  QUAST: quality assessment tool for genome assemblies , 2013, Bioinform..

[14]  Dominique Lavenier,et al.  GATB: Genome Assembly & Analysis Tool Box , 2014, Bioinform..

[15]  Nic Herndon,et al.  Tools and pipelines for BioNano data: molecule assembly pipeline and FASTA super scaffolding tool , 2015, bioRxiv.

[16]  Michael Roberts,et al.  The MaSuRCA genome assembler , 2013, Bioinform..

[17]  Chang-Jin Song,et al.  ReMILO: reference assisted misassembly detection algorithm using short and long reads , 2018, Bioinform..

[18]  M. Facciotti,et al.  An Integrated Pipeline for de Novo Assembly of Microbial Genomes , 2012, PloS one.

[19]  Z. Iqbal,et al.  Rapid Whole-Genome Sequencing for Surveillance of Salmonella enterica Serovar Enteritidis , 2014, Emerging infectious diseases.

[20]  Jian Wang,et al.  SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler , 2012, GigaScience.

[21]  Sergey A. Shiryev,et al.  Single haplotype assembly of the human genome from a hydatidiform mole , 2014, bioRxiv.

[22]  WangJianxin,et al.  DIME: A Novel Framework for De Novo Metagenomic Sequence Assembly , 2015 .

[23]  Stefanie Lüth,et al.  Whole genome sequencing as a typing tool for foodborne pathogens like Listeria monocytogenes – The way towards global harmonisation and data exchange , 2018 .

[24]  A. Gnirke,et al.  ALLPATHS 2: small genomes assembled accurately and with high continuity from short paired reads , 2009, Genome Biology.

[25]  David Laehnemann,et al.  Denoising DNA deep sequencing data—high-throughput sequencing errors and their correction , 2015, Briefings Bioinform..

[26]  Gary Van Domselaar,et al.  A Comparative Analysis of the Lyve-SET Phylogenomics Pipeline for Genomic Epidemiology of Foodborne Pathogens , 2017, Front. Microbiol..

[27]  Yadong Wang,et al.  misFinder: identify mis-assemblies in an unbiased manner using reference and paired-end reads , 2015, BMC Bioinformatics.

[28]  Florian Eggenhofer,et al.  ViennaNGS: A toolbox for building efficient next- generation sequencing analysis pipelines , 2015, F1000Research.

[29]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[30]  Michael T. Wolfinger,et al.  ViennaNGS: A toolbox for building efficient next- generation sequencing analysis pipelines. , 2015, F1000Research.

[31]  Siu-Ming Yiu,et al.  IDBA-UD: a de novo assembler for single-cell and metagenomic sequencing data with highly uneven depth , 2012, Bioinform..

[32]  M. Ventura,et al.  MEGAnnotator: a user-friendly pipeline for microbial genomes assembly and annotation. , 2016, FEMS microbiology letters.

[33]  Steven J. M. Jones,et al.  Abyss: a Parallel Assembler for Short Read Sequence Data Material Supplemental Open Access , 2022 .

[34]  Siu-Ming Yiu,et al.  Meta-IDBA: a de Novo assembler for metagenomic data , 2011, Bioinform..

[35]  Yi Pan,et al.  DIME: A Novel Framework for De Novo Metagenomic Sequence Assembly , 2015, J. Comput. Biol..

[36]  Peer Bork,et al.  MOCAT2: a metagenomic assembly, annotation and profiling framework , 2016, Bioinform..

[37]  Ruth Timme,et al.  Practical Value of Food Pathogen Traceability through Building a Whole-Genome Sequencing Network and Database , 2016, Journal of Clinical Microbiology.

[38]  M. Schatz,et al.  Metassembler: merging and optimizing de novo genome assemblies , 2015, Genome Biology.

[39]  Huixiao Hong,et al.  Challenges, Solutions, and Quality Metrics of Personal Genome Assembly in Advancing Precision Medicine , 2016, Pharmaceutics.

[40]  Francisco Pina-Martins,et al.  4Pipe4 – A 454 data analysis pipeline for SNP detection in datasets with no reference sequence or strain information , 2016, BMC Bioinformatics.

[41]  A. Gnirke,et al.  High-quality draft assemblies of mammalian genomes from massively parallel sequence data , 2010, Proceedings of the National Academy of Sciences.

[42]  Hing-Fung Ting,et al.  MEGAHIT v1.0: A fast and scalable metagenome assembler driven by advanced methodologies and community practices. , 2016, Methods.

[43]  Evan S Snitkin,et al.  Tracking a Hospital Outbreak of Carbapenem-Resistant Klebsiella pneumoniae with Whole-Genome Sequencing , 2012, Science Translational Medicine.

[44]  Mihai Pop,et al.  Minimus: a fast, lightweight genome assembler , 2007, BMC Bioinformatics.