The genomic and transcriptomic landscape of clinical Escherichia coli and Pseudomonas aeruginosa isolates

Large amounts of genomic data have been obtained due to the rapid advances in DNA sequencing technology. With efficient computational platforms, these data can provide many possibilities to improve our knowledge on species evolution and their genetic makeup. The general interest of this thesis is to facilitate studies on important biological questions by attaining the relevant information from transcriptomic and genomic data. The aims of my thesis were i) to develop the pan-genome based RNA-Seq data analysis pipeline in order to analyze ex vivo gene expression profiles of uro-pathogenic Escherichia coli isolates and ii) to create the consensus sequence of the Pseudomonas aeruginosa core genome in order to identify single nucleotide polymorphisms (SNPs) at high accuracy and to find the patho-adaptive mutations in P. aeruginosa clinical isolates. To address these aims I developed and used the pan-genome of E. coli in order to map and analyze the RNA-Seq reads that were associated with an acute urinary tract infection. Whereas the in vivo gene expression profiles of the majority of genes were conserved among the 21 E. coli strains, the specific gene expression profiles of the accessory genome were diverse and reflected phylogenetic relationships. In addition to that, whole genome sequencing data was used to gain insights into the genetic variations of 99 clinical P. aeruginosa isolates. I created the consensus sequence for every core gene based on the most frequent nucleotide. I used it as reference for the identification of SNPs across all clinical isolates. The identified SNPs were classified into clonal-specific, single and phylogenetically independent SNPs. The majority of the SNPs were clonal-dependent and single SNPs. However, I identified a large set of 2,252 genes which had one or more phylogenetically independent non-synonymous mutation. Moreover, the ratio of dN/dS on 3,814 genes revealed that the core genome is not under selection pressure. In summary, this thesis explores pan-genome-based as well as consensus sequence-based approaches on transcriptomic and genomic sequencing data of clinical isolates of E. coli and P. aeruginosa respectively. The results of the thesis contributed to understanding of sequence variations that are selected in the environment of the human host and lead to bacterial adaptation and pathogenicity. This is not only important for the basic scientific research, but also to understand the link between diversity and community structure and function.

[1]  Klaus Hornischer,et al.  BACTOME—a reference database to explore the sequence- and gene expression-variation landscape of Pseudomonas aeruginosa clinical isolates , 2018, Nucleic Acids Res..

[2]  L. Jelsbak,et al.  Host adaptation mediated by intergenic evolution in a bacterial pathogen , 2017, bioRxiv.

[3]  Rodrigo Lopez,et al.  Programmatic access to bioinformatics tools from EMBL-EBI update: 2017 , 2017, Nucleic Acids Res..

[4]  D. Edwards,et al.  SNP Discovery Using a Pangenome: Has the Single Reference Approach Become Obsolete? , 2017, Biology.

[5]  Michael Y. Galperin,et al.  The 24th annual Nucleic Acids Research database issue: a look back and upcoming changes , 2017, Nucleic acids research.

[6]  Guy Cochrane,et al.  European Nucleotide Archive in 2016 , 2016, Nucleic Acids Res..

[7]  Toshihisa Takagi,et al.  DNA Data Bank of Japan , 2016, Nucleic Acids Res..

[8]  A. Goesmann,et al.  Intraclonal genome diversity of the major Pseudomonas aeruginosa clones C and PA14 , 2016, Environmental microbiology reports.

[9]  N. Loman,et al.  Twenty years of bacterial genome sequencing , 2015, Nature Reviews Microbiology.

[10]  Panos Kalnis,et al.  Karect: accurate correction of substitution, insertion and deletion errors for next-generation sequencing data , 2015, Bioinform..

[11]  P. Gastmeier,et al.  Deep transcriptome profiling of clinical Klebsiella pneumoniae isolates reveals strain and sequence type-specific adaptation. , 2015, Environmental microbiology.

[12]  C. Robert,et al.  Pan-genomic analysis to redefine species and subspecies based on quantum discontinuous variation: the Klebsiella paradigm , 2015, Biology Direct.

[13]  Geoffrey L. Winsor,et al.  Clinical utilization of genomics data produced by the international Pseudomonas aeruginosa consortium , 2015, Front. Microbiol..

[14]  M. Schatz,et al.  Big Data: Astronomical or Genomical? , 2015, PLoS biology.

[15]  K. Hornischer,et al.  The Pseudomonas aeruginosa Transcriptional Landscape Is Shaped by Environmental Heterogeneity and Genetic Variation , 2015, mBio.

[16]  D. Raoult,et al.  The bacterial pangenome as a new tool for analysing pathogenic bacteria , 2015, New microbes and new infections.

[17]  Jonathan Wilksch,et al.  Genomic analysis of diversity, population structure, virulence, and antimicrobial resistance in Klebsiella pneumoniae, an urgent threat to public health , 2015, Proceedings of the National Academy of Sciences.

[18]  Brian C. Thomas,et al.  Unusual biology across a group comprising more than 15% of domain Bacteria , 2015, Nature.

[19]  Sonia Cárdenas-Brito,et al.  Pangenome-wide and molecular evolution analyses of the Pseudomonas aeruginosa species , 2015, BMC Genomics.

[20]  Christophe Guyeux,et al.  What It Takes to Be a Pseudomonas aeruginosa? The Core Genome of the Opportunistic Pathogen Updated , 2015, PloS one.

[21]  David R. Riley,et al.  Ten years of pan-genome analyses. , 2015, Current opinion in microbiology.

[22]  Jun Yu,et al.  A Brief Review of Software Tools for Pangenomics , 2015, Genom. Proteom. Bioinform..

[23]  S. Molin,et al.  Convergent evolution and adaptation of Pseudomonas aeruginosa within patients with cystic fibrosis , 2014, Nature Genetics.

[24]  S. Häussler,et al.  In Vivo mRNA Profiling of Uropathogenic Escherichia coli from Diverse Phylogroups Reveals Common and Group-Specific Gene Expression Profiles , 2014, mBio.

[25]  Justin Zobel,et al.  SRST2: Rapid genomic surveillance for public health and hospital microbiology labs , 2014, bioRxiv.

[26]  J. Klockgether,et al.  The extensive set of accessory Pseudomonas aeruginosa genomic components. , 2014, FEMS microbiology letters.

[27]  Dag Harmsen,et al.  Bacterial Whole-Genome Sequencing Revisited: Portable, Scalable, and Standardized Analysis for Typing and Detection of Virulence and Antibiotic Resistance Genes , 2014, Journal of Clinical Microbiology.

[28]  R. Kaas,et al.  Evaluation of Whole Genome Sequencing for Outbreak Detection of Salmonella enterica , 2014, PloS one.

[29]  Aaron E. Darling,et al.  A5-miseq: an updated pipeline to assemble microbial genomes from Illumina MiSeq data , 2014, Bioinform..

[30]  J. Bray,et al.  MLST revisited: the gene-by-gene approach to bacterial genomics , 2013, Nature Reviews Microbiology.

[31]  Mona Singh,et al.  Computational solutions for omics data , 2013, Nature Reviews Genetics.

[32]  Erik Aronesty,et al.  Comparison of Sequencing Utility Programs , 2013 .

[33]  S. Molin,et al.  Adaptation of Pseudomonas aeruginosa to the cystic fibrosis airway: an evolutionary perspective , 2012, Nature Reviews Microbiology.

[34]  E. Schadt The changing privacy landscape in the era of big data , 2012, Molecular systems biology.

[35]  Kui Lin,et al.  A phylogenomic analysis of Escherichia coli / Shigella group: implications of genomic features associated with pathogenicity and ecological adaptation , 2012, BMC Evolutionary Biology.

[36]  F. Kironde,et al.  Uropathogenic Escherichia coli Isolates from Pregnant Women in Different Countries , 2012, Journal of Clinical Microbiology.

[37]  L. Lito,et al.  Persistence of uropathogenic Escherichia coli strains in the host for long periods of time: relationship between phylogenetic groups and virulence factors , 2012, European Journal of Clinical Microbiology & Infectious Diseases.

[38]  Steven L Salzberg,et al.  Fast gapped-read alignment with Bowtie 2 , 2012, Nature Methods.

[39]  R. Geffers,et al.  The Pseudomonas aeruginosa Transcriptome in Planktonic Cultures and Static Biofilms Using RNA Sequencing , 2012, PloS one.

[40]  U. Dobrindt,et al.  What defines extraintestinal pathogenic Escherichia coli? , 2011, International journal of medical microbiology : IJMM.

[41]  D. Higgins,et al.  Fast, scalable generation of high-quality protein multiple sequence alignments using Clustal Omega , 2011, Molecular systems biology.

[42]  M. Silby,et al.  Pseudomonas genomes: diverse and adaptable. , 2011, FEMS microbiology reviews.

[43]  J. Klockgether,et al.  152* Microevolution of the major common Pseudomonas aeruginosa clones C and PA14 in cystic fibrosis lungs , 2011, Journal of Cystic Fibrosis.

[44]  Lutz Wiehlmann,et al.  Pseudomonas aeruginosa Genomic Structure and Diversity , 2011, Front. Microbio..

[45]  Anders Folkesson,et al.  Evolutionary dynamics of bacteria in a human host environment , 2011, Proceedings of the National Academy of Sciences.

[46]  Martin C. J. Maiden,et al.  BIGSdb: Scalable analysis of bacterial genome variation at the population level , 2010, BMC Bioinformatics.

[47]  Z. Deng,et al.  sRNAscanner: A Computational Tool for Intergenic Small RNA Detection in Bacterial Genomes , 2010, PloS one.

[48]  F. Klawonn,et al.  Open Access Research Article Evolutionary Conservation of Essential and Highly Expressed Genes in Pseudomonas Aeruginosa , 2022 .

[49]  W. Huber,et al.  which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited. MAnorm: a robust model for quantitative comparison of ChIP-Seq data sets , 2011 .

[50]  P. H. Roy,et al.  Complete Genome Sequence of the Multiresistant Taxonomic Outlier Pseudomonas aeruginosa PA7 , 2010, PloS one.

[51]  C. Di Serio,et al.  Pseudomonas aeruginosa microevolution during cystic fibrosis lung infection establishes clones with adapted virulence. , 2009, American journal of respiratory and critical care medicine.

[52]  R. Durbin,et al.  Mapping Quality Scores Mapping Short Dna Sequencing Reads and Calling Variants Using P

, 2022 .

[53]  Julian Parkhill,et al.  Microbiology in the post-genomic era , 2008, Nature Reviews Microbiology.

[54]  B. Birren,et al.  Dynamics of Pseudomonas aeruginosa genome evolution , 2008, Proceedings of the National Academy of Sciences.

[55]  Gabriel Moreno-Hagelsieb,et al.  Choosing BLAST options for better detection of orthologs as reciprocal best hits , 2008, Bioinform..

[56]  Christian Weinel,et al.  Population structure of Pseudomonas aeruginosa , 2007, Proceedings of the National Academy of Sciences.

[57]  Li Li,et al.  Genomic analysis reveals that Pseudomonas aeruginosa virulence is combinatorial , 2006, Genome Biology.

[58]  G. Prats,et al.  Pathogenicity island markers in commensal and uropathogenic Escherichia coli isolates. , 2006, Clinical microbiology and infection : the official publication of the European Society of Clinical Microbiology and Infectious Diseases.

[59]  R. Gomulkiewicz,et al.  Source–sink dynamics of virulence evolution , 2006, Nature Reviews Microbiology.

[60]  David A. D'Argenio,et al.  Genetic adaptation by Pseudomonas aeruginosa to the airways of cystic fibrosis patients. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[61]  Eduardo P C Rocha,et al.  Comparisons of dN/dS are time dependent for closely related bacterial genomes. , 2006, Journal of theoretical biology.

[62]  Jaideep P. Sundaram,et al.  Genome analysis of multiple pathogenic isolates of Streptococcus agalactiae: implications for the microbial "pan-genome". , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[63]  Dean Cheng,et al.  Pseudomonas aeruginosa Genome Database and PseudoCAP: facilitating community-based, continually updated, genome annotation , 2004, Nucleic Acids Res..

[64]  H. Grundmann,et al.  Development of a Multilocus Sequence Typing Scheme for the Opportunistic Pathogen Pseudomonas aeruginosa , 2004, Journal of Clinical Microbiology.

[65]  H. Mizoguchi,et al.  Extensive Genomic Diversity in Pathogenic Escherichia coli and Shigella Strains Revealed by Comparative Genomic Hybridization Microarray , 2004, Journal of bacteriology.

[66]  Korbinian Strimmer,et al.  APE: Analyses of Phylogenetics and Evolution in R language , 2004, Bioinform..

[67]  C. Stoeckert,et al.  OrthoMCL: identification of ortholog groups for eukaryotic genomes. , 2003, Genome research.

[68]  Qing Yang,et al.  Conservation of genome content and virulence determinants among clinical and environmental isolates of Pseudomonas aeruginosa , 2003, Proceedings of the National Academy of Sciences of the United States of America.

[69]  T. Whittam,et al.  Pathogenesis and evolution of virulence in enteropathogenic and enterohemorrhagic Escherichia coli. , 2001, The Journal of clinical investigation.

[70]  Ziheng Yang,et al.  Statistical methods for detecting molecular adaptation , 2000, Trends in Ecology & Evolution.

[71]  S. Lory,et al.  Complete genome sequence of Pseudomonas aeruginosa PAO1, an opportunistic pathogen , 2000, Nature.

[72]  A. Oliver,et al.  High frequency of hypermutable Pseudomonas aeruginosa in cystic fibrosis lung infection. , 2000, Science.

[73]  M. Ashburner,et al.  Gene Ontology: tool for the unification of biology , 2000, Nature Genetics.

[74]  D. Dykhuizen,et al.  Pathoadaptive mutations: gene loss and variation in bacterial pathogens. , 1999, Trends in microbiology.

[75]  M. Achtman,et al.  Multilocus sequence typing: a portable approach to the identification of clones within populations of pathogenic microorganisms. , 1998, Proceedings of the National Academy of Sciences of the United States of America.

[76]  N. W. Davis,et al.  The complete genome sequence of Escherichia coli K-12. , 1997, Science.

[77]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[78]  R. Fleischmann,et al.  Whole-genome random sequencing and assembly of Haemophilus influenzae Rd. , 1995, Science.

[79]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[80]  M. Nei,et al.  Simple methods for estimating the numbers of synonymous and nonsynonymous nucleotide substitutions. , 1986, Molecular biology and evolution.

[81]  M S Waterman,et al.  Identification of common molecular subsequences. , 1981, Journal of molecular biology.

[82]  F. Sanger,et al.  DNA sequencing with chain-terminating inhibitors. , 1977, Proceedings of the National Academy of Sciences of the United States of America.

[83]  A. Goesmann,et al.  Interclonal gradient of virulence in the Pseudomonas aeruginosa pangenome from disease and environment. , 2015, Environmental microbiology.

[84]  M. Gerstein,et al.  RNA-Seq: a revolutionary tool for transcriptomics , 2009, Nature Reviews Genetics.

[85]  Peer Bork,et al.  Interactive Tree Of Life (iTOL): an online tool for phylogenetic tree display and annotation , 2007, Bioinform..

[86]  Hiroyuki Ogata,et al.  KEGG: Kyoto Encyclopedia of Genes and Genomes , 1999, Nucleic Acids Res..

[87]  M. O. Dayhoff,et al.  Atlas of protein sequence and structure , 1965 .

[88]  Christus,et al.  A General Method Applicable to the Search for Similarities in the Amino Acid Sequence of Two Proteins , 2022 .