A new signal characterization and signal-based Chou's PseAAC representation of protein sequences

Most of the algorithms used for information extraction and for processing the amino acid chains that make up proteins treat them as symbolic chains. Fewer algorithms exploit signal processing techniques that require a numerical representation of amino acid chains. However, these algorithms are very powerful for extracting regularities that cannot be detected when working with a symbolic chain, which may be important for understanding the biological meaning of a sequence or in classification tasks. In this study, a new mathematical representation of amino acid chains is proposed, which is derived using a similarity measure based on the PAM250 amino acid substitution matrix and that generates 20 signals for each protein sequence. Using this representation 20 consensus spectra for a protein family are determined and the relevance of the frequency peaks is established, obtaining a group of significant frequency peaks that manifest common periodicities of the amino acid sequences that belong to a protein family. We also show that the proposed representation in 20 signals can be integrated into Chou's pseudo amino acid composition (PseAAC) and constitute a useful alternative to amino acid physicochemical properties in Chou's PseAAC.

[1]  I. Cosic Macromolecular bioactivity: is it resonant interaction between macromolecules?-theory and applications , 1994, IEEE Transactions on Biomedical Engineering.

[2]  Wei Chen,et al.  iNuc-PseKNC: a sequence-based predictor for predicting nucleosome positioning in genomes with pseudo k-tuple nucleotide composition , 2014, Bioinform..

[3]  Xiaolong Wang,et al.  repDNA: a Python package to generate various modes of feature vectors for DNA sequences by incorporating user-defined physicochemical properties and sequence-order effects , 2015, Bioinform..

[4]  K. Chou,et al.  iUbiq-Lys: prediction of lysine ubiquitination sites in proteins by extracting sequence evolution information via a gray system model , 2015, Journal of biomolecular structure & dynamics.

[5]  K. Chou,et al.  Low-frequency collective motion in biomacromolecules and its biological functions. , 1988, Biophysical chemistry.

[6]  B. Liu,et al.  iDNA-Prot|dis: Identifying DNA-Binding Proteins by Incorporating Amino Acid Distance-Pairs and Reduced Alphabet Profile into the General Pseudo Amino Acid Composition , 2014, PloS one.

[7]  Xiaolong Wang,et al.  iMiRNA-PseDPC: microRNA precursor identification with a pseudo distance-pair composition approach , 2016, Journal of biomolecular structure & dynamics.

[8]  Xiaolong Wang,et al.  Protein Remote Homology Detection by Combining Chou’s Pseudo Amino Acid Composition and Profile‐Based Protein Representation , 2013, Molecular informatics.

[9]  K. Chou,et al.  iCTX-Type: A Sequence-Based Predictor for Identifying the Types of Conotoxins in Targeting Ion Channels , 2014, BioMed research international.

[10]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[11]  K. Chou,et al.  The biological functions of low-frequency vibrations (phonons). 4. Resonance effects and allosteric transition. , 1984, Biophysical chemistry.

[12]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[13]  Denise Gorse,et al.  Wavelet transforms for the characterization and detection of repeating motifs. , 2002, Journal of molecular biology.

[14]  Shao-Ping Shi,et al.  Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of Chou's PseAAC via discrete wavelet transform. , 2012, Molecular bioSystems.

[15]  B. Liu,et al.  Protein remote homology detection by combining Chou’s distance-pair pseudo amino acid composition and principal component analysis , 2015, Molecular Genetics and Genomics.

[16]  Wei Chen,et al.  PseKNC-General: a cross-platform package for generating various modes of pseudo nucleotide compositions , 2015, Bioinform..

[17]  Alan V. Oppenheim,et al.  Discrete-Time Signal Pro-cessing , 1989 .

[18]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[19]  Xiang Cheng,et al.  iDrug-Target: predicting the interactions between drug compounds and target proteins in cellular networking via benchmark dataset optimization approach , 2015, Journal of biomolecular structure & dynamics.

[20]  I Cosic,et al.  The resonant recognition model (RRM) predicts amino acid residues in highly conserved regions of the hormone prolactin (PRL). , 2000, Biophysical chemistry.

[21]  Ravi Narasimhan Prediction of biologically active regions in protein sequences via best basis selection , 2010, 2010 Conference Record of the Forty Fourth Asilomar Conference on Signals, Systems and Computers.

[22]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.

[23]  Kuo-Chen Chou,et al.  iPPI-Esml: An ensemble classifier for identifying the interactions of proteins by incorporating their physicochemical properties and wavelet transforms into PseAAC. , 2015, Journal of theoretical biology.

[24]  K. Chou,et al.  iRSpot-TNCPseAAC: Identify Recombination Spots with Trinucleotide Composition and Pseudo Amino Acid Components , 2014, International journal of molecular sciences.

[25]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[26]  Xin Wang,et al.  PseAAC-Builder: a cross-platform stand-alone program for generating various special Chou's pseudo-amino acid compositions. , 2012, Analytical biochemistry.

[27]  Xiaoyong Zou,et al.  Prediction of protein secondary structure based on continuous wavelet transform. , 2003, Talanta.

[28]  Shengli Zhang,et al.  Feature analysis of protein structure by using discrete Fourier transform and continuous wavelet transform , 2009 .

[29]  José L. Pérez-Córdoba,et al.  HMM-based channel error mitigation and its application to distributed speech recognition , 2003, Speech Commun..

[30]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[31]  K. Chou,et al.  iDNA-Methyl: identifying DNA methylation sites via pseudo trinucleotide composition. , 2015, Analytical biochemistry.

[32]  Irena Cosic,et al.  Investigating the Interaction Between Oncogene and Tumor Suppressor Protein , 2009, IEEE Transactions on Information Technology in Biomedicine.

[33]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[34]  Gernot A. Fink,et al.  Pattern recognition methods for advanced stochastic protein sequence analysis using HMMs , 2006, Pattern Recognit..

[35]  Meng Wang,et al.  Using Fourier Spectrum Analysis and Pseudo Amino Acid Composition for Prediction of Membrane Protein Types , 2005, The protein journal.

[36]  L. Holm,et al.  The Pfam protein families database , 2005, Nucleic Acids Res..

[37]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[38]  Dr. Irena Cosic The Resonant Recognition Model of Macromolecular Bioactivity , 1997, BioMethods.

[39]  Jianding Qiu,et al.  Prediction of G-protein-coupled receptor classes based on the concept of Chou's pseudo amino acid composition: an approach from discrete wavelet transform. , 2009, Analytical biochemistry.

[40]  Wei Chen,et al.  iTIS-PseTNC: a sequence-based predictor for identifying translation initiation site in human genes using pseudo trinucleotide composition. , 2014, Analytical biochemistry.

[41]  Pufeng Du,et al.  PseAAC-General: Fast Building Various Modes of General Form of Chou’s Pseudo-Amino Acid Composition for Large-Scale Protein Datasets , 2014, International journal of molecular sciences.

[42]  Loris Nanni,et al.  Wavelet images and Chou’s pseudo amino acid composition for protein classification , 2011, Amino Acids.

[43]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[44]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[45]  K. Chou,et al.  The biological functions of low-frequency vibrations (phonons) 5. A phenomenological theory. , 1985, Biophysical chemistry.

[46]  K. Chou,et al.  PseKNC: a flexible web server for generating pseudo K-tuple nucleotide composition. , 2014, Analytical biochemistry.

[47]  Xiaolong Wang,et al.  repRNA: a web server for generating various feature vectors of RNA sequences , 2015, Molecular Genetics and Genomics.

[48]  Chao Chen,et al.  Dual-layer wavelet SVM for predicting protein structural class via the general form of Chou's pseudo amino acid composition. , 2012, Protein and peptide letters.

[49]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[50]  B. Liu,et al.  PseDNA‐Pro: DNA‐Binding Protein Identification by Combining Chou’s PseAAC and Physicochemical Distance Transformation , 2015, Molecular informatics.

[51]  K. V. Venkatesh,et al.  Detailed protein sequence alignment based on Spectral Similarity Score (SSS) , 2005, BMC Bioinformatics.

[52]  Shao-Ping Shi,et al.  OligoPred: a web-server for predicting homo-oligomeric proteins by incorporating discrete wavelet transform into Chou's pseudo amino acid composition. , 2011, Journal of molecular graphics & modelling.

[53]  K. Chou,et al.  Low-frequency resonance and cooperativity of hemoglobin. , 1989, Trends in biochemical sciences.

[54]  Jun Feng,et al.  A protein mapping method based on physicochemical properties and dimension reduction , 2015, Comput. Biol. Medicine.