SigniSite: Identification of residue-level genotype-phenotype correlations in protein multiple sequence alignments

Identifying which mutation(s) within a given genotype is responsible for an observable phenotype is important in many aspects of molecular biology. Here, we present SigniSite, an online application for subgroup-free residue-level genotype–phenotype correlation. In contrast to similar methods, SigniSite does not require any pre-definition of subgroups or binary classification. Input is a set of protein sequences where each sequence has an associated real number, quantifying a given phenotype. SigniSite will then identify which amino acid residues are significantly associated with the data set phenotype. As output, SigniSite displays a sequence logo, depicting the strength of the phenotype association of each residue and a heat-map identifying ‘hot’ or ‘cold’ regions. SigniSite was benchmarked against SPEER, a state-of-the-art method for the prediction of specificity determining positions (SDP) using a set of human immunodeficiency virus protease-inhibitor genotype–phenotype data and corresponding resistance mutation scores from the Stanford University HIV Drug Resistance Database, and a data set of protein families with experimentally annotated SDPs. For both data sets, SigniSite was found to outperform SPEER. SigniSite is available at: http://www.cbs.dtu.dk/services/SigniSite/.

[1]  O. Lund,et al.  Insight into Antigenic Diversity of VAR2CSA-DBL5ε Domain from Multiple Plasmodium falciparum Placental Isolates , 2010, PloS one.

[2]  Bryan Chan,et al.  Human immunodeficiency virus reverse transcriptase and protease sequence database , 2003, Nucleic Acids Res..

[3]  R. Shafer,et al.  Update of the drug resistance mutations in HIV-1: March 2013. , 2013, Topics in antiviral medicine.

[4]  Matthew Hardy,et al.  Cell-specific protein phenotypes for the autoimmune locus IL2RA using a genotype-selectable human bioresource , 2009, Nature Genetics.

[5]  Wenyaw Chan,et al.  Statistical Methods in Medical Research , 2013, Model. Assist. Stat. Appl..

[6]  Cristina Marino Buslje,et al.  Networks of High Mutual Information Define the Structural Proximity of Catalytic Sites: Implications for Catalytic Residue Identification , 2010, PLoS Comput. Biol..

[7]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[8]  Anna R Panchenko,et al.  Functional specificity lies within the properties and evolutionary changes of amino acids. , 2007, Journal of molecular biology.

[9]  S. Dudoit,et al.  STATISTICAL METHODS FOR IDENTIFYING DIFFERENTIALLY EXPRESSED GENES IN REPLICATED cDNA MICROARRAY EXPERIMENTS , 2002 .

[10]  Wei Cai,et al.  Prediction of functional specificity determinants from protein sequences using log-likelihood ratios , 2006, Bioinform..

[11]  Morten Nielsen,et al.  Seq2Logo: a method for construction and visualization of amino acid binding motifs and sequence profiles including sequence weighting, pseudo counts and two-sided representation of amino acid enrichment and depletion , 2012, Nucleic Acids Res..

[12]  Morten Nielsen,et al.  Immunological bioinformatics , 2005, Computational molecular biology.

[13]  R. Shafer Rationale and uses of a public HIV drug-resistance database. , 2006, The Journal of infectious diseases.

[14]  Jaap Heringa,et al.  Multi-Harmony: detecting functional specificity from sequence alignment , 2010, Nucleic Acids Res..

[15]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[16]  Anna R. Panchenko,et al.  SPEER-SERVER: a web server for prediction of protein specificity determining sites , 2012, Nucleic Acids Res..

[17]  D. Richman,et al.  2022 update of the drug resistance mutations in HIV-1. , 2022, Topics in antiviral medicine.

[18]  F. Cohen,et al.  An evolutionary trace method defines binding surfaces common to protein families. , 1996, Journal of molecular biology.

[19]  Kai Ye,et al.  Multi-RELIEF: a method to recognize specificity determining residues from multiple sequence alignments using a Machine-Learning approach for feature weighting , 2008, Bioinform..

[20]  Konstantin A Lukyanov,et al.  Near-infrared fluorescent proteins , 2010, Nature Methods.

[21]  Anna R. Panchenko,et al.  Ensemble approach to predict specificity determinants: benchmarking and validation , 2009, BMC Bioinformatics.

[22]  Mona Singh,et al.  Characterization and prediction of residues determining protein functional specificity , 2008, Bioinform..

[23]  Mikhail S. Gelfand,et al.  SDPpred: a tool for prediction of amino acid residues that determine differences in functional specificity of homologous proteins , 2004, Nucleic Acids Res..

[24]  E. Tolosa,et al.  Phenotype, genotype, and worldwide genetic penetrance of LRRK2-associated Parkinson's disease: a case-control study , 2008, The Lancet Neurology.