SAPA tool: finding protein regions by combination of amino acid composition, scaled profiles, patterns and rules

SUMMARY Functional modules within protein sequences are often extracted by consensus sequence patterns representing a linear motif; however, other functional regions may only be described by combined features such as amino acid composition, profiles of amino acid properties and randomly distributed short sequence motifs. If only a small number of functional examples are well characterized, the researcher needs a tool to extract similar sequences for further investigation. AVAILABILITY AND IMPLEMENTATION We provide the web application 'SAPA tool', which allows the user to search with combined properties, ranks the extracted target regions by an integrated score, estimates false discovery rates by using decoy sequences and provides them as a sequence file or spreadsheet. Source code, user manual and the web application implemented in Perl, HTML, CSS and JavaScript and running on Apache are freely available at http://sapa-tool.uio.no/sapa/

[1]  Guanghui Wang,et al.  Decoy methods for assessing false positives and false discovery rates in shotgun proteomics. , 2009, Analytical chemistry.

[2]  Steven P Gygi,et al.  Comparative evaluation of mass spectrometry platforms used in large-scale proteomics investigations , 2005, Nature Methods.

[3]  Amos Bairoch,et al.  PROSITE, a protein domain database for functional characterization and annotation , 2009, Nucleic Acids Res..

[4]  Claudine Médigue,et al.  Re-annotation of the genome sequence of Mycobacterium tuberculosis H37Rv. , 2002, Microbiology.

[5]  I. Longden,et al.  EMBOSS: the European Molecular Biology Open Software Suite. , 2000, Trends in genetics : TIG.

[6]  L. Welch,et al.  A Bioinformatics Approach to the Identification, Classification, and Analysis of Hydroxyproline-Rich Glycoproteins[W][OA] , 2010, Plant Physiology.

[7]  Robert D. Finn,et al.  InterPro: the integrative protein signature database , 2008, Nucleic Acids Res..

[8]  István Simon,et al.  BIOINFORMATICS ORIGINAL PAPER doi:10.1093/bioinformatics/btm035 Structural bioinformatics Local structural disorder imparts plasticity on linear motifs , 2022 .

[9]  Christian J. A. Sigrist,et al.  Nucleic Acids Research Advance Access published November 14, 2007 The 20 years of PROSITE , 2007 .

[10]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[11]  John C. Wootton,et al.  Statistics of Local Complexity in Amino Acid Sequences and Sequence Databases , 1993, Comput. Chem..

[12]  B. Barrell,et al.  Deciphering the biology of Mycobacterium tuberculosis from the complete genome sequence , 1998, Nature.

[13]  Christopher J. Oldfield,et al.  The unfoldomics decade: an update on intrinsically disordered proteins , 2008, BMC Genomics.

[14]  Daniel P. Depledge,et al.  RepSeq – A database of amino acid repeats present in lower eukaryotic pathogens , 2007 .

[15]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..