Predicting protein subcellular location using digital signal processing.

The biological functions of a protein are closely related to its attributes in a cell. With the rapid accumulation of newly found protein sequence data in databanks, it is highly desirable to develop an automated method for predicting the subcellular location of proteins. The establishment of such a predictor will expedite the functional determination of newly found proteins and the process of prioritizing genes and proteins identified by genomic efforts as potential molecular targets for drug design. The traditional algorithms for predicting these attributes were based solely on amino acid composition in which no sequence order effect was taken into account. To improve the prediction quality, it is necessary to incorporate such an effect. However, the number of possible patterns in protein sequences is extremely large, posing a formidable difficulty for realizing this goal. To deal with such difficulty, a well-developed tool in digital signal processing named digital Fourier transform (DFT) [1] was introduced. After being translated to a digital signal according to the hydrophobicity of each amino acid, a protein was analyzed by DFT within the frequency domain. A set of frequency spectrum parameters, thus obtained, were regarded as the factors to represent the sequence order effect. A significant improvement in prediction quality was observed by incorporating the frequency spectrum parameters with the conventional amino acid composition. One of the crucial merits of this approach is that many existing tools in mathematics and engineering can be easily applied in the predicting process. It is anticipated that digital signal processing may serve as a useful vehicle for many other protein science areas.

[1]  K Nishikawa,et al.  Discrimination of intracellular and extracellular proteins using amino acid composition and residue-pair frequencies. , 1994, Journal of molecular biology.

[2]  Jon Rigelsford,et al.  Pattern Recognition: Concepts, Methods and Applications , 2002 .

[3]  K. Chou,et al.  Protein subcellular location prediction. , 1999, Protein engineering.

[4]  Lin He,et al.  Application of Pseudo Amino Acid Composition for Predicting Protein Subcellular Location: Stochastic Signal Processing Approach , 2003, Journal of protein chemistry.

[5]  N. L. Johnson,et al.  Multivariate Analysis , 1958, Nature.

[6]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[7]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[8]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[9]  P. Mahalanobis On the generalized distance in statistics , 1936 .

[10]  K. Chou,et al.  Using discriminant function for prediction of subcellular location of prokaryotic proteins. , 1998, Biochemical and biophysical research communications.

[11]  David J. DeFatta,et al.  Digital Signal Processing: A System Design Approach , 1988 .

[12]  Kenneth J. Polakowski,et al.  A Design Approach to , 1989 .

[13]  N S Wingreen,et al.  Are protein folds atypical? , 1997, Proceedings of the National Academy of Sciences of the United States of America.

[14]  T. Hubbard,et al.  Using neural networks for prediction of the subcellular location of proteins. , 1998, Nucleic acids research.

[15]  Steven A. Tretter,et al.  Introduction to Discrete-Time Signal Processing , 1976 .

[16]  K C Chou,et al.  Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.

[17]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[18]  H Herzel,et al.  Correlations in protein sequences and property codes. , 1998, Journal of theoretical biology.