A profile-based method for identifying functional divergence of orthologous genes in bacterial genomes

Motivation: Next generation sequencing technologies have provided us with a wealth of information on genetic variation, but predicting the functional significance of this variation is a difficult task. While many comparative genomics studies have focused on gene flux and large scale changes, relatively little attention has been paid to quantifying the effects of single nucleotide polymorphisms and indels on protein function, particularly in bacterial genomics. Results: We present a hidden Markov model based approach we call delta-bitscore (DBS) for identifying orthologous proteins that have diverged at the amino acid sequence level in a way that is likely to impact biological function. We benchmark this approach with several widely used datasets and apply it to a proof-of-concept study of orthologous proteomes in an investigation of host adaptation in Salmonella enterica. We highlight the value of the method in identifying functional divergence of genes, and suggest that this tool may be a better approach than the commonly used dN/dS metric for identifying functionally significant genetic changes occurring in recently diverged organisms. Availability and Implementation: A program implementing DBS for pairwise genome comparisons is freely available at: https://github.com/UCanCompBio/deltaBS. Contact: nicole.wheeler@pg.canterbury.ac.nz or lars.barquist@uni-wuerzburg.de Supplementary information: Supplementary data are available at Bioinformatics online.

[1]  Peter L. Freddolino,et al.  Bacterial Adaptation through Loss of Function , 2013, PLoS genetics.

[2]  J. Skolnick,et al.  How well is enzyme function conserved as a function of pairwise sequence identity? , 2003, Journal of molecular biology.

[3]  W. Rabsch,et al.  Salmonella enterica Serotype Typhimurium and Its Host-Adapted Variants , 2002, Infection and Immunity.

[4]  W. Rabsch,et al.  Motility allows S. Typhimurium to benefit from the mucosal defence , 2008, Cellular microbiology.

[5]  Fabian Rivera-Chávez,et al.  Salmonella Uses Energy Taxis to Benefit from Intestinal Inflammation , 2013, PLoS pathogens.

[6]  Xavier Didelot,et al.  The application of genomics to tracing bacterial pathogen transmission. , 2015, Current opinion in microbiology.

[7]  W. Hanage,et al.  Comprehensive Identification of Single Nucleotide Polymorphisms Associated with Beta-lactam Resistance within Pneumococcal Mosaic Genes , 2014, PLoS genetics.

[8]  Peter B. McGarvey,et al.  UniRef clusters: a comprehensive and scalable alternative for improving sequence similarity searches , 2014, Bioinform..

[9]  N. Loman,et al.  High-throughput bacterial genome sequencing: an embarrassment of choice, a world of opportunity , 2012, Nature Reviews Microbiology.

[10]  Hong Yang,et al.  Genome-Scale Metabolic Network Validation of Shewanella oneidensis Using Transposon Insertion Frequency Analysis , 2014, PLoS Comput. Biol..

[11]  N. Moran,et al.  Genomic changes following host restriction in bacteria. , 2004, Current opinion in genetics & development.

[12]  J. Vogel,et al.  Accelerating Discovery and Functional Analysis of Small RNAs with New Technologies. , 2015, Annual review of genetics.

[13]  Mark Achtman,et al.  Evolutionary History of Salmonella Typhi , 2006, Science.

[14]  S. Henikoff,et al.  Predicting the effects of coding non-synonymous variants on protein function using the SIFT algorithm , 2009, Nature Protocols.

[15]  Liqing Zhang,et al.  HMMvar-func: a new method for predicting the functional outcome of genetic variants , 2015, BMC Bioinformatics.

[16]  J. Miller,et al.  Predicting the Functional Effect of Amino Acid Substitutions and Indels , 2012, PloS one.

[17]  I. Goodhead,et al.  Taking the pseudo out of pseudogenes. , 2015, Current opinion in microbiology.

[18]  Sean R. Eddy,et al.  Accelerated Profile HMM Searches , 2011, PLoS Comput. Biol..

[19]  E. Birney,et al.  Pfam: the protein families database , 2013, Nucleic Acids Res..

[20]  Steven C. Ricke,et al.  Salmonella Pathogenicity and Host Adaptation in Chicken-Associated Serovars , 2013, Microbiology and Molecular Reviews.

[21]  Kenneth H. Buetow,et al.  Large-scale analysis of non-synonymous coding region single nucleotide polymorphisms , 2004, Bioinform..

[22]  Thomas A. Hopf,et al.  Protein structure prediction from sequence variation , 2012, Nature Biotechnology.

[23]  S. Eddy,et al.  tRNAscan-SE: a program for improved detection of transfer RNA genes in genomic sequence. , 1997, Nucleic acids research.

[24]  Julian Parkhill,et al.  Evidence for several waves of global transmission in the seventh cholera pandemic , 2011, Nature.

[25]  Masatoshi Nei,et al.  Reliabilities of identifying positive selection by the branch-site and the site-prediction methods , 2009, Proceedings of the National Academy of Sciences.

[26]  Ziheng Yang,et al.  Statistical methods for detecting molecular adaptation , 2000, Trends in Ecology & Evolution.

[27]  Tom R. Gaunt,et al.  Predicting the Functional, Molecular, and Phenotypic Consequences of Amino Acid Substitutions using Hidden Markov Models , 2012, Human mutation.

[28]  Julian Parkhill,et al.  Whole-genome sequencing for analysis of an outbreak of meticillin-resistant Staphylococcus aureus: a descriptive study , 2013, The Lancet. Infectious Diseases.

[29]  S. Bentley,et al.  Developing insights into the mechanisms of evolution of bacterial pathogens from whole-genome sequences. , 2012, Future microbiology.

[30]  Eduardo P C Rocha,et al.  Comparisons of dN/dS are time dependent for closely related bacterial genomes. , 2006, Journal of theoretical biology.

[31]  W. Rabsch,et al.  Genome and Transcriptome Adaptation Accompanying Emergence of the Definitive Type 2 Host-Restricted Salmonella enterica Serovar Typhimurium Pathovar , 2013, mBio.

[32]  Gemma C. Langridge,et al.  Patterns of genome evolution that have accompanied host adaptation in Salmonella , 2014, Proceedings of the National Academy of Sciences.

[33]  L. Price,et al.  The Epidemic of Extended-Spectrum-β-Lactamase-Producing Escherichia coli ST131 Is Driven by a Single Highly Pathogenic Subclone, H30-Rx , 2013, mBio.

[34]  Anne-Laure Boulesteix,et al.  Over-optimism in bioinformatics research , 2010, Bioinform..

[35]  Karsten M. Borgwardt,et al.  The Evaluation of Tools Used to Predict the Impact of Missense Variants Is Hindered by Two Types of Circularity , 2015, Human mutation.

[36]  Marianne Manchester,et al.  Complete mutagenesis of the HIV-1 protease , 1989, Nature.

[37]  Lars Barquist,et al.  Approaches to querying bacterial genomes with transposon-insertion sequencing , 2013, RNA biology.

[38]  Howard Ochman,et al.  The Extinction Dynamics of Bacterial Pseudogenes , 2010, PLoS genetics.

[39]  S. Bouvier,et al.  Systematic mutation of bacteriophage T4 lysozyme. , 1991, Journal of molecular biology.

[40]  David A. Lee,et al.  Domain-based and family-specific sequence identity thresholds increase the levels of reliable protein function transfer. , 2009, Journal of molecular biology.

[41]  Frank Klawonn,et al.  Relative cost curves: An alternative to AUC and an extension to 3-class problems , 2014, Kybernetika.

[42]  Ziheng Yang,et al.  PAML: a program package for phylogenetic analysis by maximum likelihood , 1997, Comput. Appl. Biosci..

[43]  F. Fang,et al.  Loss of Multicellular Behavior in Epidemic African Nontyphoidal Salmonella enterica Serovar Typhimurium ST313 Strain D23580 , 2016, mBio.

[44]  S. Salzberg,et al.  Whole-Genome Comparison of Mycobacterium tuberculosis Clinical and Laboratory Strains , 2002, Journal of bacteriology.

[45]  Thomas Lengauer,et al.  ROCR: visualizing classifier performance in R , 2005, Bioinform..

[46]  B. Barrell,et al.  Complete genomes of two clinical Staphylococcus aureus strains: evidence for the rapid evolution of virulence and drug resistance. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[47]  S. Clare,et al.  Emergence of host-adapted Salmonella Enteritidis through rapid evolution in an immunocompromised host , 2016, Nature Microbiology.

[48]  Rekha R Meyer,et al.  Comparison of genome degradation in Paratyphi A and Typhi, human-restricted serovars of Salmonella enterica that cause typhoid , 2004, Nature Genetics.

[49]  Minoru Kanehisa,et al.  KEGG as a reference resource for gene and protein annotation , 2015, Nucleic Acids Res..

[50]  C Cruz,et al.  Genetic studies of the lac repressor. XIV. Analysis of 4000 altered Escherichia coli lac repressors reveals essential and non-essential residues, as well as "spacers" which do not require a specific sequence. , 1994, Journal of molecular biology.

[51]  J. Kitzman,et al.  Regional Isolation Drives Bacterial Diversification within Cystic Fibrosis Lungs. , 2015, Cell host & microbe.

[52]  J. Plotkin,et al.  The Population Genetics of dN/dS , 2008, PLoS genetics.

[53]  Nancy R. Zhang,et al.  Allelic variation contributes to bacterial host specificity , 2015, Nature Communications.

[54]  N. Thomson,et al.  'Add, stir and reduce': Yersinia spp. as model bacteria for pathogen evolution , 2016, Nature Reviews Microbiology.

[55]  N. Moran,et al.  Microbial Minimalism Genome Reduction in Bacterial Pathogens , 2002, Cell.

[56]  S. Koren,et al.  One chromosome, one contig: complete microbial genomes from long-read sequencing and assembly. , 2015, Current opinion in microbiology.

[57]  S. Molin,et al.  Convergent evolution and adaptation of Pseudomonas aeruginosa within patients with cystic fibrosis , 2014, Nature Genetics.

[58]  Shu-Lin Liu,et al.  Gene Decay in Shigella as an Incipient Stage of Host-Adaptation , 2011, PloS one.

[59]  C. Sander,et al.  Predicting the functional impact of protein mutations: application to cancer genomics , 2011, Nucleic acids research.

[60]  N. Loman,et al.  Twenty years of bacterial genome sequencing , 2015, Nature Reviews Microbiology.

[61]  Nicholas A Feasey,et al.  Invasive non-typhoidal salmonella disease: an emerging and neglected tropical disease in Africa , 2012, The Lancet.

[62]  S. Foster,et al.  A single natural nucleotide mutation alters bacterial pathogen host-tropism , 2015, Nature Genetics.

[63]  A. Bäumler,et al.  Comparative Analysis of Salmonella Genomes Identifies a Metabolic Network for Escalating Growth in the Inflamed Gut , 2014, mBio.

[64]  S. Clare,et al.  Signatures of Adaptation in Human Invasive Salmonella Typhimurium ST313 Populations from Sub-Saharan Africa , 2015, PLoS neglected tropical diseases.

[65]  Jukka Corander,et al.  Parallel independent evolution of pathogenicity within the genus Yersinia , 2014, Proceedings of the National Academy of Sciences.

[66]  Georgios S. Vernikos,et al.  Comparative genome analysis of Salmonella Enteritidis PT4 and Salmonella Gallinarum 287/91 provides insights into evolutionary and host adaptation pathways. , 2008, Genome research.

[67]  Joshua A. Lerman,et al.  Genome-scale metabolic reconstructions of multiple Escherichia coli strains highlight strain-specific adaptations to nutritional environments , 2013, Proceedings of the National Academy of Sciences.