NRProF: Neural response based protein function prediction algorithm

A large amount of proteomic data is being generated due to the advancements in high-throughput genome sequencing. But the rate of functional annotation of these sequences falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOfigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. The lack of annotation coverage of the existing methods advocates novel methods to improve protein function prediction. Here we present a automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. The main idea of this algorithm is to define a distance metric that corresponds to the similarity of the subsequences and reflects how the human brain can distinguish different sequences. Given query protein, we predict the most similar target protein using a two layered neural response algorithm and thereby assigned the GO term of the target protein to the query. Our method predicted and ranked the actual leaf GO term among the top 5 probable GO terms with 87.66% accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The NRProF program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/.

[1]  Nick V Grishin,et al.  Access the most recent version at doi: 10.1110/ps.03197403 References , 2003 .

[2]  D. Kihara,et al.  PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data , 2009, Proteins.

[3]  Daisuke Kihara,et al.  Enhanced automated function prediction using distantly related sequences and contextual association by PFP , 2006, Protein science : a publication of the Protein Society.

[4]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[5]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[6]  Jin-An Feng,et al.  NdPASA: A novel pairwise protein sequence alignment algorithm that incorporates neighbor‐dependent amino acid propensities , 2005, Proteins.

[7]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[8]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[9]  Julia V Ponomarenko,et al.  Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology , 2005, Proteins.

[10]  Amos Bairoch,et al.  ScanProsite: a reference implementation of a PROSITE scanning tool. , 2002, Applied bioinformatics.

[11]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[12]  Lorenzo Rosasco,et al.  Publisher Accessed Terms of Use Detailed Terms Mathematics of the Neural Response , 2022 .

[13]  Günther Zehetner,et al.  OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms , 2003, Nucleic Acids Res..

[14]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[15]  Roland L. Dunbrack Sequence comparison and protein structure prediction. , 2006, Current opinion in structural biology.

[16]  C. Gille,et al.  Conservation of substructures in proteins: interfaces of secondary structural elements in proteasomal subunits. , 2000, Journal of molecular biology.

[17]  Wei Li,et al.  NdPASA: a pairwise sequence alignment server for distantly related proteins , 2005, Bioinform..

[18]  Christine A. Orengo,et al.  FFPred: an integrated feature-based function prediction server for vertebrate proteomes , 2008, Nucleic Acids Res..

[19]  L. L. Lloyd,et al.  Enzyme nomenclature — Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology: Academic Press Ltd, London, UK, 1992. xiii + 862 pp. Price £40.00. ISBN 0-12-227165-3 , 1994 .

[20]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[21]  Daisuke Kihara,et al.  Function Prediction of uncharacterized proteins , 2007, J. Bioinform. Comput. Biol..

[22]  Carl J. Schmidt,et al.  GoFigure: Automated Gene OntologyTM annotation , 2003, Bioinform..

[23]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[24]  Hans Lehrach,et al.  GOblet: a platform for Gene Ontology annotation of anonymous sequence data , 2004, Nucleic Acids Res..

[25]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[26]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[27]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[28]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[29]  J. Skolnick,et al.  From genes to protein structure and function: novel applications of computational approaches in the genomic era. , 2000, Trends in biotechnology.

[30]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[31]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[32]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[33]  SödingJohannes Protein homology detection by HMM--HMM comparison , 2005 .

[34]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[35]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[36]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[37]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[38]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[39]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[40]  Junwen Wang,et al.  Exploring the sequence patterns in the α‐helices of proteins , 2003 .

[41]  Jin-An Feng,et al.  Exploring the sequence patterns in the alpha-helices of proteins. , 2003, Protein engineering.

[42]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..