A novel neural response algorithm for protein function prediction

BackgroundLarge amounts of data are being generated by high-throughput genome sequencing methods. But the rate of the experimental functional characterization falls far behind. To fill the gap between the number of sequences and their annotations, fast and accurate automated annotation methods are required. Many methods, such as GOblet, GOFigure, and Gotcha, are designed based on the BLAST search. Unfortunately, the sequence coverage of these methods is low as they cannot detect the remote homologues. Adding to this, the lack of annotation specificity advocates the need to improve automated protein function prediction.ResultsWe designed a novel automated protein functional assignment method based on the neural response algorithm, which simulates the neuronal behavior of the visual cortex in the human brain. Firstly, we predict the most similar target protein for a given query protein and thereby assign its GO term to the query sequence. When assessed on test set, our method ranked the actual leaf GO term among the top 5 probable GO terms with accuracy of 86.93%.ConclusionsThe proposed algorithm is the first instance of neural response algorithm being used in the biological domain. The use of HMM profiles along with the secondary structure information to define the neural response gives our method an edge over other available methods on annotation accuracy. Results of the 5-fold cross validation and the comparison with PFP and FFPred servers indicate the prominent performance by our method. The program, the dataset, and help files are available at http://www.jjwanglab.org/NRProF/.

[1]  Christine A. Orengo,et al.  FFPred: an integrated feature-based function prediction server for vertebrate proteomes , 2008, Nucleic Acids Res..

[2]  J. Whisstock,et al.  Prediction of protein function from protein sequence and structure , 2003, Quarterly Reviews of Biophysics.

[3]  Tipton Kf,et al.  Nomenclature Committee of the International Union of Biochemistry and Molecular Biology (NC-IUBMB). Enzyme nomenclature. Recommendations 1992. Supplement: corrections and additions. , 1994 .

[4]  Lorenzo Rosasco,et al.  Publisher Accessed Terms of Use Detailed Terms Mathematics of the Neural Response , 2022 .

[5]  Nick V Grishin,et al.  Access the most recent version at doi: 10.1110/ps.03197403 References , 2003 .

[6]  Günther Zehetner,et al.  OntoBlast function: from sequence similarities directly to potential functional annotations by ontology terms , 2003, Nucleic Acids Res..

[7]  C. Gille,et al.  Conservation of substructures in proteins: interfaces of secondary structural elements in proteasomal subunits. , 2000, Journal of molecular biology.

[8]  A. Bairoch PROSITE: a dictionary of sites and patterns in proteins. , 1991, Nucleic acids research.

[9]  A G Murzin,et al.  SCOP: a structural classification of proteins database for the investigation of sequences and structures. , 1995, Journal of molecular biology.

[10]  Ying Gao,et al.  Bioinformatics Applications Note Sequence Analysis Cd-hit Suite: a Web Server for Clustering and Comparing Biological Sequences , 2022 .

[11]  Jin-An Feng,et al.  Exploring the sequence patterns in the alpha-helices of proteins. , 2003, Protein engineering.

[12]  Liam J. McGuffin,et al.  The PSIPRED protein structure prediction server , 2000, Bioinform..

[13]  Johannes Söding,et al.  Protein homology detection by HMM?CHMM comparison , 2005, Bioinform..

[14]  Susumu Goto,et al.  The KEGG resource for deciphering the genome , 2004, Nucleic Acids Res..

[15]  J. Skolnick,et al.  From genes to protein structure and function: novel applications of computational approaches in the genomic era. , 2000, Trends in biotechnology.

[16]  Geoffrey J. Barton,et al.  GOtcha: a new method for prediction of protein function assessed by the annotation of seven genomes , 2004, BMC Bioinformatics.

[17]  Julia V Ponomarenko,et al.  Assigning new GO annotations to protein data bank sequences by combining structure and sequence homology , 2005, Proteins.

[18]  Amos Bairoch,et al.  ScanProsite: a reference implementation of a PROSITE scanning tool. , 2002, Applied bioinformatics.

[19]  Jin-An Feng,et al.  NdPASA: A novel pairwise protein sequence alignment algorithm that incorporates neighbor‐dependent amino acid propensities , 2005, Proteins.

[20]  Wei Li,et al.  NdPASA: a pairwise sequence alignment server for distantly related proteins , 2005, Bioinform..

[21]  Kimberly Van Auken,et al.  WormBase: a multi-species resource for nematode biology and genomics , 2004, Nucleic Acids Res..

[22]  D. Eisenberg,et al.  Inference of protein function from protein structure. , 2005, Structure.

[23]  Gene Ontology Consortium The Gene Ontology (GO) database and informatics resource , 2003 .

[24]  L. L. Lloyd,et al.  Enzyme nomenclature — Recommendations of the Nomenclature Committee of the International Union of Biochemistry and Molecular Biology: Academic Press Ltd, London, UK, 1992. xiii + 862 pp. Price £40.00. ISBN 0-12-227165-3 , 1994 .

[25]  T. N. Bhat,et al.  The Protein Data Bank , 2000, Nucleic Acids Res..

[26]  Hans Lehrach,et al.  GOblet: a platform for Gene Ontology annotation of anonymous sequence data , 2004, Nucleic Acids Res..

[27]  R. Durbin,et al.  Pfam: A comprehensive database of protein domain families based on seed alignments , 1997, Proteins.

[28]  C. Daub,et al.  BMC Systems Biology , 2007 .

[29]  H. Mewes,et al.  The FunCat, a functional annotation scheme for systematic classification of proteins from whole genomes. , 2004, Nucleic acids research.

[30]  Johannes Söding,et al.  The HHpred interactive server for protein homology detection and structure prediction , 2005, Nucleic Acids Res..

[31]  Daisuke Kihara,et al.  Function Prediction of uncharacterized proteins , 2007, J. Bioinform. Comput. Biol..

[32]  Carl J. Schmidt,et al.  GoFigure: Automated Gene OntologyTM annotation , 2003, Bioinform..

[33]  N. Aronszajn Theory of Reproducing Kernels. , 1950 .

[34]  Sean R. Eddy,et al.  Biological Sequence Analysis: Probabilistic Models of Proteins and Nucleic Acids , 1998 .

[35]  D. Kihara,et al.  PFP: Automated prediction of gene ontology functional annotations with confidence scores using protein sequence data , 2009, Proteins.

[36]  Daisuke Kihara,et al.  Enhanced automated function prediction using distantly related sequences and contextual association by PFP , 2006, Protein science : a publication of the Protein Society.

[37]  R. Doolittle The multiplicity of domains in proteins. , 1995, Annual review of biochemistry.

[38]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[39]  David C. Jones,et al.  CATH--a hierarchic classification of protein domain structures. , 1997, Structure.

[40]  M. Gerstein,et al.  Assessing annotation transfer for genomics: quantifying the relations between protein sequence, structure and function through traditional and probabilistic scores. , 2000, Journal of molecular biology.

[41]  B. Rost,et al.  Automatic prediction of protein function , 2003, Cellular and Molecular Life Sciences CMLS.

[42]  Nomenclature committee of the international union of biochemistry and molecular biology (NC-IUBMB), Enzyme Supplement 5 (1999). , 1999, European journal of biochemistry.

[43]  R. Shah,et al.  Least Squares Support Vector Machines , 2022 .

[44]  A. Valencia Automatic annotation of protein function. , 2005, Current opinion in structural biology.

[45]  James E. Bray,et al.  The CATH Database provides insights into protein structure/function relationships , 1999, Nucleic Acids Res..

[46]  Roland L. Dunbrack Sequence comparison and protein structure prediction. , 2006, Current opinion in structural biology.

[47]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[48]  Junwen Wang,et al.  Exploring the sequence patterns in the α‐helices of proteins , 2003 .