PIRSitePredict for protein functional site prediction using position-specific rules

Abstract Methods focused on predicting ‘global’ annotations for proteins (such as molecular function, biological process and presence of domains or membership in a family) have reached a relatively mature stage. Methods to provide fine-grained ‘local’ annotation of functional sites (at the level of individual amino acid) are now coming to the forefront, especially in light of the rapid accumulation of genetic variant data. We have developed a computational method and workflow that predicts functional sites within proteins using position-specific conditional template annotation rules (namely PIR Site Rules or PIRSRs for short). Such rules are curated through review of known protein structural and other experimental data by structural biologists and are used to generate high-quality annotations for the UniProt Knowledgebase (UniProtKB) unreviewed section. To share the PIRSR functional site prediction method with the broader scientific community, we have streamlined our workflow and developed a stand-alone Java software package named PIRSitePredict. We demonstrate the use of PIRSitePredict for functional annotation of de novo assembled genome/transcriptome by annotating uncharacterized proteins from Trinity RNA-seq assembly of embryonic transcriptomes of the following three cartilaginous fishes: Leucoraja erinacea (Little Skate), Scyliorhinus canicula (Small-spotted Catshark) and Callorhinchus milii (Elephant Shark). On average about 1200 lines of annotations were predicted for each species.

[1]  M. Tress,et al.  Sequence-based feature prediction and annotation of proteins , 2009, Genome Biology.

[2]  Cathy H. Wu,et al.  UniProt: the Universal Protein knowledgebase , 2004, Nucleic Acids Res..

[3]  Cathy H. Wu,et al.  PIRSF Family Classification System for Protein Functional and Evolutionary Analysis , 2006, Evolutionary bioinformatics online.

[4]  Sonika Bhatnagar,et al.  Computational Methods for Prediction of Protein-Protein Interactions: PPI Prediction Methods , 2017 .

[5]  Søren Brunak,et al.  Prediction of human protein function according to Gene Ontology categories , 2003, Bioinform..

[6]  Allegra Via,et al.  Phospho.ELM: a database of phosphorylation sites—update 2008 , 2008, Nucleic Acids Res..

[7]  Edward C. Holmes,et al.  Seqotron: a user-friendly sequence editor for Mac OS X , 2016, BMC Research Notes.

[8]  Anton J. Enright,et al.  Classification schemes for protein structure and function , 2003, Nature Reviews Genetics.

[9]  B KC Dukka,et al.  Structure-based Methods for Computational Protein Functional Site Prediction , 2013, Computational and structural biotechnology journal.

[10]  Bing Yu,et al.  In Silico Tools for Gene Discovery , 2011, Methods in Molecular Biology.

[11]  Silvio C. E. Tosatto,et al.  InterPro in 2017—beyond protein family and domain annotations , 2016, Nucleic Acids Res..

[12]  Alfonso Valencia,et al.  firestar—prediction of functionally important residues using structural templates and alignment reliability , 2007, Nucleic Acids Res..

[13]  Cathy H. Wu,et al.  Structure-guided rule-based annotation of protein functional sites in UniProt knowledgebase. , 2011, Methods in molecular biology.

[14]  Janet M. Thornton,et al.  Mechanism and Catalytic Site Atlas (M-CSA): a database of enzyme reaction mechanisms and active sites , 2017, Nucleic Acids Res..

[15]  M. dal Peraro,et al.  Protein post-translational modifications: In silico prediction tools and molecular modeling , 2017, Computational and structural biotechnology journal.

[16]  Anushya Muruganujan,et al.  Large-scale gene function analysis with the PANTHER classification system , 2013, Nature Protocols.

[17]  Cathy H. Wu,et al.  SkateBase, an elasmobranch genome project and collection of molecular resources for chondrichthyan fishes , 2014, F1000Research.

[18]  Janet M. Thornton,et al.  The Catalytic Site Atlas 2.0: cataloging catalytic sites and residues identified in enzymes , 2013, Nucleic Acids Res..

[19]  N. Blom,et al.  Prediction of post‐translational glycosylation and phosphorylation of proteins from the amino acid sequence , 2004, Proteomics.

[20]  Alan Bridge,et al.  New and continuing developments at PROSITE , 2012, Nucleic Acids Res..

[21]  Toby J. Gibson,et al.  ELM 2016—data update and new functionality of the eukaryotic linear motif resource , 2015, Nucleic Acids Res..

[22]  B. N. Sobolev,et al.  Prediction of protein post-translational modifications: main trends and methods , 2014 .

[23]  Hui Li,et al.  In silico prediction of post-translational modifications. , 2011, Methods in molecular biology.

[24]  Sayoni Das,et al.  Protein function annotation using protein domain family resources. , 2016, Methods.

[25]  N. Friedman,et al.  Trinity : reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2016 .

[26]  Cathryn M. Gould,et al.  Phospho.ELM: a database of phosphorylation sites—update 2011 , 2010, Nucleic acids research.

[27]  Michelle G. Giglio,et al.  TIGRFAMs and Genome Properties: tools for the assignment of molecular function and biological process in prokaryotic genomes , 2006, Nucleic Acids Res..

[28]  S. Eddy,et al.  Challenges in homology search: HMMER3 and convergent evolution of coiled-coil regions , 2013, Nucleic acids research.

[29]  Bidyadhar Subudhi,et al.  Handbook of Research on Computational Intelligence Applications in Bioinformatics , 2016 .

[30]  Birgit Eisenhaber,et al.  Prediction of posttranslational modification of proteins from their amino acid sequence. , 2010, Methods in molecular biology.

[31]  Robert D. Finn,et al.  The Pfam protein families database: towards a more sustainable future , 2015, Nucleic Acids Res..

[32]  Elisabeth Coudert,et al.  HAMAP in 2015: updates to the protein family classification and annotation system , 2014, Nucleic Acids Res..

[33]  N. Friedman,et al.  Trinity: reconstructing a full-length transcriptome without a genome from RNA-Seq data , 2011, Nature Biotechnology.