A tree-based conservation scoring method for short linear motifs in multiple alignments of protein sequences

BackgroundThe structure of many eukaryotic cell regulatory proteins is highly modular. They are assembled from globular domains, segments of natively disordered polypeptides and short linear motifs. The latter are involved in protein interactions and formation of regulatory complexes. The function of such proteins, which may be difficult to define, is the aggregate of the subfunctions of the modules. It is therefore desirable to efficiently predict linear motifs with some degree of accuracy, yet sequence database searches return results that are not significant.ResultsWe have developed a method for scoring the conservation of linear motif instances. It requires only primary sequence-derived information (e.g. multiple alignment and sequence tree) and takes into account the degenerate nature of linear motif patterns. On our benchmarking, the method accurately scores 86% of the known positive instances, while distinguishing them from random matches in 78% of the cases. The conservation score is implemented as a real time application designed to be integrated into other tools. It is currently accessible via a Web Service or through a graphical interface.ConclusionThe conservation score improves the prediction of linear motifs, by discarding those matches that are unlikely to be functional because they have not been conserved during the evolution of the protein sequences. It is especially useful for instances in non-structured regions of the proteins, where a domain masking filtering strategy is not applicable.

[1]  Amos Bairoch,et al.  The PROSITE database , 2005, Nucleic Acids Res..

[2]  J. H. Shinn,et al.  Minimotif Miner: a tool for investigating protein function , 2006, Nature Methods.

[3]  Amos Bairoch,et al.  PROSITE: A Documented Database Using Patterns and Profiles as Motif Descriptors , 2002, Briefings Bioinform..

[4]  Robert C. Edgar,et al.  MUSCLE: a multiple sequence alignment method with reduced time and space complexity , 2004, BMC Bioinformatics.

[5]  István Simon,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm035 Structural bioinformatics Local structural disorder imparts plasticity on linear motifs , 2022 .

[6]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.

[7]  Toby J. Gibson,et al.  Discovery of candidate KEN-box motifs using Cell Cycle keyword enrichment combined with native disorder prediction and motif conservation , 2008, Bioinform..

[8]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[9]  M. Yaffe,et al.  A motif-based profile scanning approach for genome-wide prediction of signaling pathways , 2001, Nature Biotechnology.

[10]  Marc A. Martí-Renom,et al.  Characterization of Protein Hubs by Inferring Interacting Motifs from Protein Interactions , 2007, PLoS Comput. Biol..

[11]  R. Russell,et al.  Linear motifs: Evolutionary interaction switches , 2005, FEBS letters.

[12]  Nir Ben-Tal,et al.  QuasiMotiFinder: protein annotation by searching for evolutionarily conserved motif-like patterns , 2005, Nucleic Acids Res..

[13]  S. R. Pettifer,et al.  UTOPIA—User-Friendly Tools for Operating Informatics Applications , 2004, Comparative and functional genomics.

[14]  Olivier Poch,et al.  A new protein linear motif benchmark for multiple sequence alignment software , 2008, BMC Bioinformatics.

[15]  Itay Mayrose,et al.  Rate4Site: an algorithmic tool for the identification of functional regions in proteins by surface mapping of evolutionary determinants within their homologues , 2002, ISMB.

[16]  Desmond G. Higgins,et al.  Scoring Function for Pattern Discovery Programs Taking Into Account Sequence Diversity , 1996 .

[17]  Heinrich Sticht,et al.  A computational strategy for the prediction of functional linear peptide motifs in proteins , 2007, Bioinform..

[18]  Cathy H. Wu,et al.  The Universal Protein Resource (UniProt): an expanding universe of protein information , 2005, Nucleic Acids Res..

[19]  Richard J. Edwards,et al.  The SLiMDisc server: short, linear motif discovery in proteins , 2007, Nucleic Acids Res..

[20]  T. Pawson,et al.  Reading protein modifications with interaction domains , 2006, Nature Reviews Molecular Cell Biology.

[21]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[22]  Michael B. Yaffe,et al.  Scansite 2.0: proteome-wide prediction of cell signaling interactions using short sequence motifs , 2003, Nucleic Acids Res..

[23]  Axel T Brunger,et al.  Structural basis of FFAT motif-mediated ER targeting. , 2005, Structure.

[24]  Chuong B. Do,et al.  ProbCons: Probabilistic consistency-based multiple sequence alignment. , 2005, Genome research.

[25]  Robert B. Russell,et al.  DILIMOT: discovery of linear motifs in proteins , 2006, Nucleic Acids Res..

[26]  Richard R Copley,et al.  The EH1 motif in metazoan transcription factors , 2005, BMC Genomics.

[27]  K. Katoh,et al.  MAFFT: a novel method for rapid multiple sequence alignment based on fast Fourier transform. , 2002, Nucleic acids research.

[28]  Peter B. McGarvey,et al.  UniRef: comprehensive and non-redundant UniProt reference clusters , 2007, Bioinform..

[29]  See-Kiong Ng,et al.  A correlated motif approach for finding short linear motifs from protein interaction networks , 2006, BMC Bioinformatics.

[30]  Julie Dawn Thompson,et al.  Improved sensitivity of profile searches through the use of sequence weights and gap excision , 1994, Comput. Appl. Biosci..

[31]  Xiang-Jiao Yang,et al.  Multisite protein modification and intramolecular signaling , 2005, Oncogene.

[32]  Rodrigo Lopez,et al.  Web Services at the European Bioinformatics Institute , 2007, Nucleic Acids Res..

[33]  Olivier Poch,et al.  A comprehensive comparison of multiple sequence alignment programs , 1999, Nucleic Acids Res..

[34]  Peer Bork,et al.  SMART: a web-based tool for the study of genetically mobile domains , 2000, Nucleic Acids Res..

[35]  Alex Braiman,et al.  Oligomerization of signaling complexes by the multipoint binding of GRB2 to both LAT and SOS1 , 2006, Nature Structural &Molecular Biology.

[36]  W. S. Valdar,et al.  Scoring residue conservation , 2002, Proteins.

[37]  Leszek Rychlewski,et al.  ELM server: a new resource for investigating short functional sites in modular eukaryotic proteins , 2003, Nucleic Acids Res..

[38]  Sang Joon Kim,et al.  A Mathematical Theory of Communication , 2006 .

[39]  Richard J. Edwards,et al.  SLiMFinder: A Probabilistic Method for Identifying Over-Represented, Convergently Evolved, Short Linear Motifs in Proteins , 2007, PloS one.