Whole genome sequencing of Group A Streptococcus : development and evaluation of an automated pipeline for emm gene typing

Streptococcus pyogenes group A streptococcus (GAS) is the most common cause of bacterial throat infections, and can cause mild to severe skin and soft tissue infections, including impetigo, erysipelas, necrotizing fasciitis, as well as systemic and fatal infections including septicaemia and meningitis. Estimated annual incidence for invasive group A streptococcal infection (iGAS) in industrialised countries is approximately 3 per 100,000 per year. Typing is currently used in England and Wales to monitor bacterial strains of S. pyogenes causing invasive infections and those isolated from patients and healthcare/care workers in cluster and outbreak situations. Sequence analysis of the emm gene is the currently accepted gold standard methodology for GAS typing. A comprehensive database of emm types observed from superficial and invasive GAS strains from England and Wales informs outbreak control teams during investigations. Each year the Bacterial Reference Department, Public Health England (PHE) receives approximately 3000 GAS isolates from England and Wales. In April 2014 the Bacterial Reference Department, PHE began genomic sequencing of referred S. pyogenes isolates and those pertaining to selected elderly/nursing care or maternity clusters from 2010 to inform future reference services and outbreak analysis (n=3047). In line with the modernizing strategy of PHE, we developed a novel bioinformatics pipeline that can predict emm types using whole genome sequence (WGS) data. The efficiency of this method was measured by comparing the emm type assigned by this method against the result from the current gold standard methodology; concordance to emm subtype level was observed in 93.8% (2852/3040) of our cases, whereas in 2.4% (n=72) of our cases concordance was observed to emm type level. Abstract Streptococcus pyogenes group A streptococcus (GAS) is the most common cause of bacterial throat infections, and can cause mild to severe skin and soft tissue infections, including impetigo, 28 erysipelas, necrotizing fasciitis, as well as systemic and fatal infections including septicaemia 29 and meningitis. Estimated annual incidence for invasive group A streptococcal infection (iGAS) 30 in industrialised countries is approximately 3 per 100,000 per year. Typing is currently used in 31 England and Wales to monitor bacterial strains of S. pyogenes causing invasive infections and 32 those isolated from patients and healthcare/care workers in cluster and outbreak situations. 33 Sequence analysis of the emm gene is the currently accepted gold standard methodology for GAS 34 typing. A comprehensive database of emm types observed from superficial and invasive GAS 35 strains from England and Wales informs outbreak control teams during investigations. Each year 36 the Bacterial Reference Department, Public Health England (PHE) receives approximately 3000 37 GAS isolates from England and Wales. In April 2014 the Bacterial Reference Department, PHE 38 began genomic sequencing of referred S. pyogenes isolates and those pertaining to selected 39 elderly/nursing care or maternity clusters from 2010 to inform future reference services and 40 outbreak analysis (n=3047). In line with the modernizing strategy of PHE, we developed a novel 41 bioinformatics pipeline that can predict emm types using whole genome sequence (WGS) data. comparing

[1]  D. Litt,et al.  Whole genome sequencing of Streptococcus pneumoniae: development, evaluation and verification of targets for serogroup and serotype prediction using an automated pipeline , 2016, PeerJ.

[2]  Michael W Parker,et al.  A systematic and functional classification of Streptococcus pyogenes that serves as a new tool for molecular typing and vaccine development. , 2014, The Journal of infectious diseases.

[3]  Justin Zobel,et al.  SRST2: Rapid genomic surveillance for public health and hospital microbiology labs , 2014, bioRxiv.

[4]  Björn Usadel,et al.  Trimmomatic: a flexible trimmer for Illumina sequence data , 2014, Bioinform..

[5]  N. Loman,et al.  High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity , 2012, Nature Reviews Microbiology.

[6]  Sergey I. Nikolenko,et al.  SPAdes: A New Genome Assembly Algorithm and Its Applications to Single-Cell Sequencing , 2012, J. Comput. Biol..

[7]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[8]  Ning Ma,et al.  BLAST+: architecture and applications , 2009, BMC Bioinformatics.

[9]  M. Cunningham,et al.  Pathogenesis of group A streptococcal infections. , 2000, Clinical microbiology reviews.

[10]  Dwight R. Johnson,et al.  emm typing and validation of provisional M types for group A streptococci. , 1999, Emerging infectious diseases.

[11]  Dwight R. Johnson Laboratory Diagnosis of Group a Streptococcal Infections , 1997 .

[12]  R. Facklam,et al.  Sequencing emm-specific PCR products for routine and accurate typing of group A streptococci , 1996, Journal of clinical microbiology.

[13]  V. Fischetti,et al.  Antiphagocytic activity of streptococcal M protein: selective binding of complement control protein factor H. , 1988, Proceedings of the National Academy of Sciences of the United States of America.

[14]  R. Lancefield PERSISTENCE OF TYPE-SPECIFIC ANTIBODIES IN MAN FOLLOWING INFECTION WITH GROUP A STREPTOCOCCI , 1959, The Journal of experimental medicine.