MINTyper: A method for generating phylogenetic distance matrices with long read sequencing data

In this paper we present a complete pipeline for generating a phylogenetic distance matrix from a set of sequencing reads. Importantly, the program is able to handle a mix of both short reads from the Illumina sequencing platforms and long reads from Oxford Nanopore Technologies' (ONT) platforms as input. By employing automated reference identication, KMA alignment, optional methylation masking, recombination SNP pruning and pairwise distance calculations, we were able to build a complete pipeline for rapidly and accurately calculating the phylogenetic distances between a set of sequenced isolates with a presumed epidemiolocigal relation. Functions were built to allow for both high-accuracy base-called MinION reads (hac Q10) and fast generated lower-quality reads (fast Q8) to be used. The phylogenetical erent qualities of ONT data with correct input parameters were nearly identical, however a higher number of base pairs were excluded from the calculated distance matrix when fast Q8 reads were used.

[1]  Jacqueline A. Keane,et al.  Rapid phylogenetic analysis of large samples of recombinant bacterial whole genome sequences using Gubbins , 2014, Nucleic acids research.

[2]  H. Hasman,et al.  Complete Nucleotide Sequence of an Escherichia coli Sequence Type 410 Strain Carrying blaNDM-5 on an IncF Multidrug Resistance Plasmid and blaOXA-181 on an IncX3 Plasmid , 2018, Genome Announcements.

[3]  Joana Damas,et al.  A near-chromosome-scale genome assembly of the gemsbok (Oryx gazella): an iconic antelope of the Kalahari desert , 2019, GigaScience.

[4]  C. Netherton,et al.  A Deep-Sequencing Workflow for the Fast and Efficient Generation of High-Quality African Swine Fever Virus Whole-Genome Sequences , 2019, Viruses.

[5]  Nabil-Fareed Alikhan,et al.  Comparative analysis of core genome MLST and SNP typing within a European Salmonella serovar Enteritidis outbreak , 2018, International journal of food microbiology.

[6]  Ole Lund,et al.  Rapid and precise alignment of raw reads against redundant databases with KMA , 2018, BMC Bioinformatics.

[7]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[8]  T. Dallman,et al.  Comparison of single nucleotide variants identified by Illumina and Oxford Nanopore technologies in the context of a potential outbreak of Shiga Toxin Producing Escherichia coli , 2019, bioRxiv.

[9]  H. Hasman,et al.  Escherichia coli Sequence Type 410 Is Causing New International High-Risk Clones , 2018, mSphere.

[10]  Gregory J Tsongalis,et al.  Third-Generation Sequencing in the Clinical Laboratory: Exploring the Advantages and Challenges of Nanopore Sequencing , 2019, Journal of Clinical Microbiology.

[11]  Mehrdad Hajibabaei,et al.  Massively parallel multiplex DNA sequencing for specimen identification using an Illumina MiSeq platform , 2015, Scientific Reports.

[12]  Leonard McMillan,et al.  FMLRC: Hybrid long read error correction using an FM-index , 2018, BMC Bioinformatics.

[13]  James Clarke,et al.  Nanopore development at Oxford Nanopore , 2016, Nature Biotechnology.