Using Pseudo Amino Acid Composition to Predict Protein Attributes Via Cellular Automata and Other Approaches

With the avalanche of protein sequences generated in the post-genomic age, many typical topics in bioinformatics, proteomics and system biology are relevant to identification of various attributes of uncharacterized proteins or need this kind of knowledge. Unfortunately, it is both time-consuming and costly to acquire the desired information by purely conducting biochemical experiments. Therefore, it is highly desirable to develop automated methods for fast and accurately identifying various attributes of proteins based on their sequences information alone. This is the convergence between bioinformatics and artificial intelligence techniques (AI). To establish powerful computational methods in this regard, one of the key procedures is to find an effective mathematical expression for the protein samples that can truly reflect their intrinsic correlation with the target to be predicted. To realize this, the pseudo amino acid (PseAA) composition or PseAAC was proposed. Stimulated by the concept of PseAAC, a series of different modes of PseAAC were developed to deal with proteins or proteins-related systems. The current review is mainly focused on those PseAAC modes that were formulated via cellular automata. By using some optimal space-time evolvement rules of cellular automata, a protein sequence can be represented by a unique image, the so-called cellular automata (CA) image or CAI. Many important features, which are deeply hidden in piles of long and complicated amino acid sequences, can be clearly revealed through their CAIs. It is anticipated that, owing to its impressive power, intuitiveness and relative simplicity, the CAI approach holds a great potential in bioinformatics and other related areas.

[1]  P. Klein,et al.  Prediction of protein structural class by discriminant analysis. , 1986, Biochimica et biophysica acta.

[2]  Colin G. Johnson,et al.  Colour merging for the visualization of biomolecular sequence data , 2003, Proceedings on Seventh International Conference on Information Visualization, 2003. IV 2003..

[3]  Zhanchao Li,et al.  Using Chou's amphiphilic pseudo-amino acid composition and support vector machine for prediction of enzyme subfamily classes. , 2007, Journal of theoretical biology.

[4]  Kuo-Chen Chou,et al.  Signal-CF: a subsite-coupled and window-fusing approach for predicting signal peptides. , 2007, Biochemical and biophysical research communications.

[5]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[6]  C. Zhang,et al.  Diagrammatization of codon usage in 339 human immunodeficiency virus proteins and its biological implication. , 1992, AIDS research and human retroviruses.

[7]  K. Chou,et al.  Graphic rules in steady and non-steady state enzyme kinetics. , 1989, The Journal of biological chemistry.

[8]  Kuo-Chen Chou,et al.  Ensemble classifier for protein fold pattern recognition , 2006, Bioinform..

[9]  Y Cai,et al.  Prediction of protein structural classes by neural network. , 2000, Biochimie.

[10]  J. Nieto,et al.  Use of fuzzy clustering technique and matrices to classify amino acids and its impact to Chou's pseudo amino acid composition. , 2009, Journal of theoretical biology.

[11]  Yang Dai,et al.  Assessing protein similarity with Gene Ontology and its use in subnuclear localization prediction , 2006, BMC Bioinformatics.

[12]  Søren Brunak,et al.  Prediction of human protein function according to Gene Ontology categories , 2003, Bioinform..

[13]  Abraham Lempel,et al.  On the Complexity of Finite Sequences , 1976, IEEE Trans. Inf. Theory.

[14]  Peer Bork,et al.  SMART: a web-based tool for the study of genetically mobile domains , 2000, Nucleic Acids Res..

[15]  Z. Wen,et al.  Using pseudo amino acid composition to predict transmembrane regions in protein: cellular automata and Lempel-Ziv complexity , 2007, Amino Acids.

[16]  G. Li,et al.  Classifying G protein-coupled receptors and nuclear receptors on the basis of protein power spectrum from fast Fourier transform , 2006, Amino Acids.

[17]  Jianding Qiu,et al.  Prediction of G-protein-coupled receptor classes based on the concept of Chou's pseudo amino acid composition: an approach from discrete wavelet transform. , 2009, Analytical biochemistry.

[18]  K. Chou Progress in protein structural class prediction and its impact to bioinformatics and proteomics. , 2005, Current protein & peptide science.

[19]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[20]  C. Munteanu,et al.  Generalized lattice graphs for 2D-visualization of biological information , 2009, Journal of Theoretical Biology.

[21]  Yu-Dong Cai,et al.  Support Vector Machines for predicting protein structural class , 2001, BMC Bioinformatics.

[22]  Kuo-Chen Chou,et al.  GPCR-GIA: a web-server for identifying G-protein coupled receptors and their families with grey incidence analysis. , 2009, Protein engineering, design & selection : PEDS.

[23]  C. Zhang,et al.  A weighting method for predicting protein structural class from amino acid composition. , 1992, European journal of biochemistry.

[24]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001 .

[25]  Zheng-Zhi Wang,et al.  Classification of G-protein coupled receptors at four levels. , 2006, Protein engineering, design & selection : PEDS.

[26]  G P Zhou,et al.  Some insights into protein structural class prediction , 2001, Proteins.

[27]  Shao-Wu Zhang,et al.  Using the concept of Chou’s pseudo amino acid composition to predict protein subcellular localization: an approach by incorporating evolutionary information and von Neumann entropies , 2008, Amino Acids.

[28]  Narmada Thanki,et al.  CDD: specific functional annotation with the Conserved Domain Database , 2008, Nucleic Acids Res..

[29]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[30]  Yanzhi Guo,et al.  Predicting DNA-binding proteins: approached from Chou’s pseudo amino acid composition and other specific sequence features , 2007, Amino Acids.

[31]  Kuldip Singh,et al.  A Time-Series-Based Feature Extraction Approach for Prediction of Protein Structural Class , 2008, EURASIP J. Bioinform. Syst. Biol..

[32]  E. Myers,et al.  Basic local alignment search tool. , 1990, Journal of molecular biology.

[33]  Kuo-Chen Chou,et al.  Predicting protein structural class by functional domain composition. , 2004, Biochemical and biophysical research communications.

[34]  Robert D. Finn,et al.  The Pfam protein families database , 2004, Nucleic Acids Res..

[35]  Jiang Wang,et al.  Prediction of protein structural class with Rough Sets , 2006, BMC Bioinformatics.

[36]  Kuo-Chen Chou,et al.  Predicting enzyme family class in a hybridization space , 2004, Protein science : a publication of the Protein Society.

[37]  Peer Bork,et al.  SMART 6: recent updates and new developments , 2008, Nucleic Acids Res..

[38]  M. Sternberg,et al.  Automated prediction of protein function and detection of functional sites from structure. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[39]  A. Millar,et al.  Exploring the Function-Location Nexus: Using Multiple Lines of Evidence in Defining the Subcellular Location of Plant Proteins , 2009, The Plant Cell Online.

[40]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[41]  K. Chou,et al.  EzyPred: a top-down approach for predicting enzyme functional classes and subclasses. , 2007, Biochemical and biophysical research communications.

[42]  Stephen Wolfram,et al.  A New Kind of Science , 2003, Artificial Life.

[43]  Kuo-Chen Chou,et al.  A New Method for Predicting the Subcellular Localization of Eukaryotic Proteins with Both Single and Multiple Sites: Euk-mPLoc 2.0 , 2010, PloS one.

[44]  Kuo-Chen Chou,et al.  GPCR‐CA: A cellular automaton image approach for predicting G‐protein–coupled receptor functional classes , 2009, J. Comput. Chem..

[45]  K. Chou,et al.  2D-MH: A web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids. , 2010, Journal of theoretical biology.

[46]  K. Chou,et al.  An optimization approach to predicting protein structural class from amino acid composition , 1992, Protein science : a publication of the Protein Society.

[47]  Xiaomei Wu,et al.  Prediction of yeast protein–protein interaction network: insights from the Gene Ontology and annotations , 2006, Nucleic acids research.

[48]  C. Zhang,et al.  A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences. , 1994, Journal of molecular biology.

[49]  K. Chou,et al.  Euk-mPLoc: a fusion classifier for large-scale eukaryotic protein subcellular location prediction by incorporating multiple sites. , 2007, Journal of proteome research.

[50]  C. DeLisi,et al.  Prediction of protein structural class from the amino acid sequence , 1986, Biopolymers.

[51]  Zong Dai,et al.  Prediction of protein structural classes by Chou’s pseudo amino acid composition: approached using continuous wavelet transform and principal component analysis , 2009, Amino Acids.

[52]  K. Chou Applications of graph theory to enzyme kinetics and protein folding kinetics. Steady and non-steady-state systems. , 2020, Biophysical chemistry.

[53]  J. Dorado,et al.  Trypano-PPI: a web server for prediction of unique targets in trypanosome proteome by using electrostatic parameters of protein-protein interactions. , 2010, Journal of proteome research.

[54]  G M Maggiora,et al.  Disposition of amphiphilic helices in heteropolar environments , 1997, Proteins.

[55]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[56]  Shiow-Fen Hwang,et al.  ProLoc-GO: Utilizing informative Gene Ontology terms for sequence-based prediction of protein subcellular localization , 2008, BMC Bioinformatics.

[57]  Kuo-Chen Chou,et al.  Using pseudo amino acid composition to predict protein structural classes: Approached with complexity measure factor , 2006, J. Comput. Chem..

[58]  Z. Huang,et al.  Using complexity measure factor to predict protein subcellular location , 2005, Amino Acids.

[59]  K. Chou,et al.  Predicting protein fold pattern with functional domain and sequential evolution information. , 2009, Journal of theoretical biology.

[60]  K. Chou A novel approach to predicting protein structural classes in a (20–1)‐D amino acid composition space , 1995, Proteins.

[61]  Kuo-Chen Chou,et al.  GPCR-2L: predicting G protein-coupled receptors and their types by hybridizing two different modes of pseudo amino acid compositions. , 2011, Molecular bioSystems.

[62]  Z. Huang,et al.  Using pseudo amino acid composition to predict protein subcellular location: Approached with Lyapunov index, Bessel function, and Chebyshev filter , 2005, Amino Acids.

[63]  Zhou Genfa,et al.  A weighting method for predicting protein structural class from amino acid composition. , 1992 .

[64]  Thomas L. Madden,et al.  Improving the accuracy of PSI-BLAST protein database searches with composition-based statistics and other refinements. , 2001, Nucleic acids research.

[65]  K. Chou,et al.  Plant-mPLoc: A Top-Down Strategy to Augment the Power for Predicting Plant Protein Subcellular Localization , 2010, PloS one.

[66]  Ming-Kuei Hu,et al.  Visual pattern recognition by moment invariants , 1962, IRE Trans. Inf. Theory.

[67]  K. Chou,et al.  Prediction of protein structural classes. , 1995, Critical reviews in biochemistry and molecular biology.

[68]  B. S. Manjunath,et al.  Using texture to analyze and manage large collections of remote sensed image and video data. , 2004, Applied optics.

[69]  K. Chou,et al.  Predicting protein structural classes with pseudo amino acid composition: an approach using geometric moments of cellular automaton image. , 2008, Journal of theoretical biology.

[70]  Lukasz A. Kurgan,et al.  Prediction of protein structural class using novel evolutionary collocation‐based sequence representation , 2008, J. Comput. Chem..

[71]  K. Chou,et al.  Prediction of protein subcellular locations by GO-FunD-PseAA predictor. , 2004, Biochemical and biophysical research communications.

[72]  Cristian R. Munteanu,et al.  Enzymes/non-enzymes classification model complexity based on composition, sequence, 3D and topological indices. , 2008, Journal of theoretical biology.

[73]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[74]  Robert M. Haralick,et al.  Textural Features for Image Classification , 1973, IEEE Trans. Syst. Man Cybern..

[75]  Martin C. Frith,et al.  SeqVISTA: a graphical tool for sequence feature visualization and comparison , 2003, BMC Bioinformatics.

[76]  Jen-Reng Object Detection using Geometric Invariant Moment , 2006 .

[77]  Kuo-Chen Chou,et al.  Prediction of protein structure classes with pseudo amino acid composition and fuzzy support vector machine network. , 2007, Protein and peptide letters.

[78]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[79]  J. Dorado,et al.  Ontologies of drug discovery and design for neurology, cardiology and oncology. , 2010, Current pharmaceutical design.

[80]  E. Uriarte,et al.  Multi-target QPDR classification model for human breast and colon cancer-related proteins using star graph topological indices , 2008, Journal of Theoretical Biology.