Prediction of enzyme family classes.

Classes of newly found enzyme sequences are usually determined either by biochemical analysis of eukaryotic and prokaryotic genomes or by microarray chips. These experimental methods are both time-consuming and costly. With the explosion of protein sequences entering into databanks, it is highly desirable to explore the feasibility of selectively classifying newly found enzyme sequences into their respective enzyme classes by means of an automated method. This is indeed important because knowing which family or subfamily an enzyme belongs to may help deduce its catalytic mechanism and specificity, giving clues to the relevant biological function. In this study, a bioinformatical analysis was conducted for 2640 oxidoreductases classified into 16 subclasses according to the different types of substrates they act on during the catalytic process. Although it is an extremely complicated problem and might involve the knowledge of 3-dimensional structure as well as many other physical chemistry factors, some quite promising results have been obtained indicating that the family or subfamily of an enzyme is predictable to a considerable degree by means of sequence-based approach alone if a good training dataset can be established.

[1]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[2]  K. Chou,et al.  Prediction of Protein Structural Classes by Modified Mahalanobis Discriminant Algorithm , 1998, Journal of protein chemistry.

[3]  Rolf Apweiler,et al.  The SWISS-PROT protein sequence data bank and its supplement TrEMBL , 1997, Nucleic Acids Res..

[4]  Guo-Ping Zhou,et al.  An Intriguing Controversy over Protein Structural Class Prediction , 1998, Journal of protein chemistry.

[5]  Yu-Dong Cai,et al.  Is it a paradox or misinterpretation? , 2001, Proteins.

[6]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[7]  C. Zhang,et al.  Predicting protein folding types by distance functions that make allowances for amino acid interactions. , 1994, The Journal of biological chemistry.

[8]  K. Chou,et al.  Prediction and classification of domain structural classes , 1998, Proteins.

[9]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[10]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[11]  Amos Bairoch,et al.  The ENZYME database in 2000 , 2000, Nucleic Acids Res..

[12]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[13]  K. Chou,et al.  A key driving force in determination of protein structural classes. , 1999, Biochemical and biophysical research communications.

[14]  J. Thompson,et al.  CLUSTAL W: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. , 1994, Nucleic acids research.