Position-specific annotation of protein function based on multiple homologs.

I present in this work an algorithm for deriving protein functional annotations which are position-specific. The input is based on the results of a sequence similarity search of the query sequence against a sequence database. Strings of words are extracted from the descriptions of the proteins, and the correlation between proteins having the same descriptors and the amino acid conservation is used to compute a score that indicates which descriptor is likely to describe better the function of each particular residue. Analysis of the score curves and comparison of different functions allows an easy detection of parts of the sequence associated to different function. Different levels of functional specificity can be compared, allowing to choose the one that suits better the function of the protein. Immediate applications of this algorithm are, support for (automated) methods of protein functional annotation, and database coherence check.

[1]  Michael Y. Galperin,et al.  Comparison of archaeal and bacterial genomes: computer analysis of protein sequences predicts novel functions and suggests a chimeric origin for the archaea , 1997, Molecular microbiology.

[2]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[3]  A Bairoch,et al.  Go hunting in sequence databases but watch out for the traps. , 1996, Trends in genetics : TIG.

[4]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its new supplement TREMBL , 1996, Nucleic Acids Res..

[5]  M A Andrade,et al.  Bioinformatics: from genome data to biological knowledge. , 1997, Current opinion in biotechnology.

[6]  P. Bork,et al.  Predicting functions from protein sequences—where are the bottlenecks? , 1998, Nature Genetics.

[7]  Chris Sander,et al.  MView: a web-compatible database search or multiple alignment viewer , 1998, Bioinform..

[8]  Chris Sander,et al.  GeneQuiz: A Workbench for Sequence Analysis , 1994, ISMB.

[9]  Michael Y. Galperin,et al.  Sources of systematic error in functional annotation of genomes: domain rearrangement, non-orthologous gene displacement, and operon disruption , 1998, Silico Biol..

[10]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[11]  A. Bairoch,et al.  The SWISS-PROT protein sequence data bank. , 1991, Nucleic acids research.

[12]  C. Sander,et al.  Genequiz II: Automatic Function Assignment For Genome Sequence Analysis , 1996 .

[13]  Miguel A. Andrade-Navarro,et al.  Automated genome sequence analysis and annotation , 1999, Bioinform..