Protein sequence analysis by incorporating modified chaos game and physicochemical properties into Chou's general pseudo amino acid composition.

In this contribution we introduced a novel graphical method to compare protein sequences. By mapping a protein sequence into 3D space based on codons and physicochemical properties of 20 amino acids, we are able to get a unique P-vector from the 3D curve. This approach is consistent with wobble theory of amino acids. We compute the distance between sequences by their P-vectors to measure similarities/dissimilarities among protein sequences. Finally, we use our method to analyze four datasets and get better results compared with previous approaches.

[1]  Yusen Zhang,et al.  Comparative analysis of protein primary sequences with graph energy , 2015 .

[2]  Tianming Wang,et al.  Phylogenetic Analysis of Protein Sequences Based on Distribution of Length About Common Substring , 2011, The protein journal.

[3]  Stephen S.-T. Yau,et al.  DNA sequence comparison by a novel probabilistic method , 2011, Inf. Sci..

[4]  J. Chou,et al.  Kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-88204E. , 1993, Biochemistry.

[6]  O. Gotoh An improved algorithm for matching biological sequences. , 1982, Journal of molecular biology.

[7]  Majid Mohammad Beigi,et al.  Prediction of allergenic proteins by means of the concept of Chou's pseudo amino acid composition and a machine learning approach. , 2012 .

[8]  Jia-Feng Yu,et al.  A novel 2D graphical representation of protein sequence based on individual amino acid , 2011 .

[9]  K. Chou,et al.  Wenxiang: a web-server for drawing wenxiang diagrams , 2011 .

[10]  Jonas S. Almeida,et al.  Alignment-free sequence comparison-a review , 2003, Bioinform..

[11]  Kai Song,et al.  New developments of alignment-free sequence comparison: measures, statistics and next-generation sequencing , 2014, Briefings Bioinform..

[12]  Mourad Elloumi,et al.  Comparison of Strings Belonging to the Same Family , 1998, Inf. Sci..

[13]  S. B. Needleman,et al.  A general method applicable to the search for similarities in the amino acid sequence of two proteins. , 1970, Journal of molecular biology.

[14]  Dong-Sheng Cao,et al.  propy: a tool to generate various modes of Chou's PseAAC , 2013, Bioinform..

[15]  David Burstein,et al.  The Average Common Substring Approach to Phylogenomic Reconstruction , 2006, J. Comput. Biol..

[16]  Qi Dai,et al.  Comparison study on k-word statistical measures for protein: From sequence to 'sequence space' , 2008, BMC Bioinformatics.

[17]  K. Chou,et al.  Graphic rules in steady and non-steady state enzyme kinetics. , 1989, The Journal of biological chemistry.

[18]  B. Reid,et al.  Structure of a DNA:RNA hybrid duplex. Why RNase H does not cleave pure RNA. , 1993, Journal of molecular biology.

[19]  C. Kuo-chen,et al.  FoldRate: A Web-Server for Predicting Protein Folding Rates from Primary Sequence , 2009 .

[20]  S. Forsén,et al.  Graphical rules for enzyme-catalysed rate laws. , 1980, The Biochemical journal.

[21]  K. Chou,et al.  2D-MH: A web-server for generating graphic representation of protein sequences based on the physicochemical properties of their constituent amino acids. , 2010, Journal of theoretical biology.

[22]  Yongsheng Ding,et al.  An application of gene comparative image for predicting the effect on replication ratio by HBV virus gene missense mutation. , 2005, Journal of theoretical biology.

[23]  K. Chou Applications of graph theory to enzyme kinetics and protein folding kinetics. Steady and non-steady-state systems. , 2020, Biophysical chemistry.

[24]  H. J. Jeffrey Chaos game representation of gene structure. , 1990, Nucleic acids research.

[25]  Ping-an He,et al.  A Novel Descriptor for Protein Similarity Analysis , 2011 .

[26]  Chenglong Yu,et al.  A protein map and its application. , 2008, DNA and cell biology.

[27]  Kuo-Chen Chou,et al.  A probability cellular automaton model for hepatitis B viral infections. , 2006, Biochemical and biophysical research communications.

[28]  Saurabh Sinha,et al.  A statistical method for alignment-free comparison of regulatory sequences , 2007, ISMB/ECCB.

[29]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[30]  Luonan Chen,et al.  Evaluating Protein Similarity from Coarse Structures , 2009, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[31]  Yusen Zhang,et al.  A novel method for similarity/dissimilarity analysis of protein sequences , 2013 .

[32]  Ri-Bo Huang,et al.  The pH-triggered conversion of the PrP(c) to PrP(sc.). , 2013, Current topics in medicinal chemistry.

[33]  Tuan D. Pham,et al.  A probabilistic measure for alignment-free sequence comparison , 2004, Bioinform..

[34]  Matthew N. Davies,et al.  Alignment-Independent Techniques for Protein Classification , 2008 .

[35]  W. Zhong,et al.  Molecular Science for Drug Development and Biomedicine , 2014, International journal of molecular sciences.

[36]  K C Chou,et al.  Graphic analysis of codon usage strategy in 1490 human proteins , 1993, Journal of protein chemistry.

[37]  K C Chou,et al.  Prediction of protein structural classes and subcellular locations. , 2000, Current protein & peptide science.

[38]  C. Zhang,et al.  A graphic approach to analyzing codon usage in 1562 Escherichia coli protein coding sequences. , 1994, Journal of molecular biology.

[39]  G. Zhou,et al.  An extension of Chou's graphic rules for deriving enzyme kinetic equations to systems involving parallel reaction pathways. , 1984, The Biochemical journal.

[40]  Xiaohua Yang,et al.  Number of distinct sequence alignments with k-match and match sections , 2015, Comput. Biol. Medicine.

[41]  J. Chou,et al.  Steady-state kinetic studies with the non-nucleoside HIV-1 reverse transcriptase inhibitor U-87201E. , 1993, The Journal of biological chemistry.

[42]  C. Zhang,et al.  A joint prediction of the folding types of 1490 human proteins from their genetic codons. , 1993, Journal of theoretical biology.

[43]  K. Chou Graphic rule for drug metabolism systems. , 2010, Current drug metabolism.

[44]  M. Ford,et al.  Molecular evolution of transferrin: evidence for positive selection in salmonids. , 2001, Molecular biology and evolution.

[45]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[46]  Sukanta Mondal,et al.  Chou's pseudo amino acid composition improves sequence-based antifreeze protein prediction. , 2014, Journal of theoretical biology.

[47]  Guo-Ping Zhou The disposition of the LZCC protein residues in wenxiang diagram provides new insights into the protein–protein interaction mechanism , 2011, Journal of Theoretical Biology.

[48]  De-Shuang Huang,et al.  Normalized Feature Vectors: A Novel Alignment-Free Sequence Comparison Method Based on the Numbers of Adjacent Amino Acids , 2013, IEEE/ACM Transactions on Computational Biology and Bioinformatics.

[49]  Ling Li,et al.  Similarity/dissimilarity studies of protein sequences based on a new 2D graphical representation , 2010, J. Comput. Chem..

[50]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[51]  Chenglong Yu,et al.  Protein map: an alignment-free sequence comparison method based on various properties of amino acids. , 2011, Gene.

[52]  E. Hamori,et al.  H curves, a novel method of representation of nucleotide series especially suited for long DNA sequences. , 1983, The Journal of biological chemistry.

[53]  Winston Hide,et al.  Biological Evaluation of d2, an Algorithm for High-Performance Sequence Comparison , 1994, J. Comput. Biol..

[54]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[55]  K. Chou,et al.  Energy-optimized structure of antifreeze protein and its binding mechanism. , 1992, Journal of molecular biology.