Predicting membrane proteins and their types by extracting various sequence features into Chou’s general PseAAC

For many biological functions membrane proteins (MPs) are considered crucial. Due to this nature of MPs, many pharmaceutical agents have reflected them as attractive targets. It bears indispensable importance that MPs are predicted with accurate measures using effective and efficient computational models (CMs). Annotation of MPs using in vitro analytical techniques is time-consuming and expensive; and in some cases, it can prove to be intractable. Due to this scenario, automated prediction and annotation of MPs through CM based techniques have appeared to be useful. Based on the use of computational intelligence and statistical moments based feature set, an MP prediction framework is proposed. Furthermore, the previously used dataset has been enhanced by incorporating new MPs from the latest release of UniProtKB. Rigorous experimentation proves that the use of statistical moments with a multilayer neural network, trained using back-propagation based prediction techniques allows more thorough results.

[1]  Wei Chen,et al.  iRSpot-PseDNC: identify recombination spots with pseudo dinucleotide composition , 2013, Nucleic acids research.

[2]  K. Chou,et al.  iKcr-PseEns: Identify lysine crotonylation sites in histone proteins with pseudo components and ensemble classifier. , 2017, Genomics.

[3]  Junjie Chen,et al.  Pse-in-One: a web server for generating various modes of pseudo components of DNA, RNA, and protein sequences , 2015, Nucleic Acids Res..

[4]  Juan Mei,et al.  Prediction of HIV-1 and HIV-2 proteins by using Chou’s pseudo amino acid compositions and different classifiers , 2018, Scientific Reports.

[5]  R. C. Papademetriou,et al.  Reconstructing with moments , 1992, Proceedings., 11th IAPR International Conference on Pattern Recognition. Vol. III. Conference C: Image, Speech and Signal Analysis,.

[6]  Liang Kong,et al.  iRSpot-ADPM: Identify recombination spots by incorporating the associated dinucleotide product model into Chou's pseudo components. , 2018, Journal of theoretical biology.

[7]  K. Chou,et al.  iRNA-Methyl: Identifying N(6)-methyladenosine sites using pseudo nucleotide composition. , 2015, Analytical biochemistry.

[8]  E Siva Sankari,et al.  Predicting membrane protein types by incorporating a novel feature set into Chou's general PseAAC. , 2018, Journal of theoretical biology.

[9]  Kuo-Chen Chou,et al.  pLoc-mHum: predict subcellular localization of multi-location human proteins via general PseAAC to winnow out the crucial GO information , 2018, Bioinform..

[10]  Kuo-Bin Li,et al.  Predicting membrane protein types by incorporating protein topology, domains, signal peptides, and physicochemical properties into the general form of Chou's pseudo amino acid composition. , 2013, Journal of theoretical biology.

[11]  Wei Chen,et al.  iOri-Human: identify human origin of replication by incorporating dinucleotide physicochemical properties into pseudo nucleotide composition , 2016, Oncotarget.

[12]  Kuo-Chen Chou,et al.  iRNAm5C-PseDNC: identifying RNA 5-methylcytosine sites by incorporating physical-chemical properties into pseudo dinucleotide composition , 2017, Oncotarget.

[13]  Ahmad Hassan Butt,et al.  A Treatise to Computational Approaches Towards Prediction of Membrane Protein and Its Subtypes , 2016, The Journal of Membrane Biology.

[14]  K. Chou Pseudo Amino Acid Composition and its Applications in Bioinformatics, Proteomics and System Biology , 2009 .

[15]  De-Shuang Huang,et al.  iRO-3wPseKNC: identify DNA replication origins by three-window-based PseKNC , 2018, Bioinform..

[16]  K. Chou Impacts of bioinformatics to medicinal chemistry. , 2015, Medicinal chemistry (Shariqah (United Arab Emirates)).

[17]  Martin Fodslette Meiller A Scaled Conjugate Gradient Algorithm for Fast Supervised Learning , 1993 .

[18]  K. Chou,et al.  iSS-PseDNC: Identifying Splicing Sites Using Pseudo Dinucleotide Composition , 2014, BioMed research international.

[19]  Dong Xu,et al.  iPhos‐PseEvo: Identifying Human Phosphorylated Proteins by Incorporating Evolutionary Information into General PseAAC via Grey System Theory , 2017, Molecular informatics.

[20]  Kuo-Chen Chou,et al.  iPreny-PseAAC: Identify C-terminal Cysteine Prenylation Sites in Proteins by Incorporating Two Tiers of Sequence Couplings into PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[21]  K. Chou,et al.  Prediction of membrane protein types and subcellular locations , 1999, Proteins.

[22]  Kuo-Chen Chou,et al.  An Unprecedented Revolution in Medicinal Chemistry Driven by the Progress of Biological Science. , 2017, Current topics in medicinal chemistry.

[23]  Maqsood Hayat,et al.  Author ' s Accepted Manuscript Classification of membrane protein types using Voting feature interval in combination with Chou ' s pseudo amino acid composition , 2015 .

[24]  Wei Chen,et al.  iPro54-PseKNC: a sequence-based predictor for identifying sigma-54 promoters in prokaryote with pseudo k-tuple nucleotide composition , 2014, Nucleic acids research.

[25]  K. Chou,et al.  iACP: a sequence-based tool for identifying anticancer peptides , 2016, Oncotarget.

[26]  Kuo-Chen Chou,et al.  Some remarks on predicting multi-label attributes in molecular biosystems. , 2013, Molecular bioSystems.

[27]  Kuo-Chen Chou,et al.  MemType-2L: a web server for predicting membrane proteins and their types by incorporating evolution information through Pse-PSSM. , 2007, Biochemical and biophysical research communications.

[28]  Kuo-Chen Chou,et al.  iRSpot-Pse6NC: Identifying recombination spots in Saccharomyces cerevisiae by incorporating hexamer composition into general PseKNC , 2018, International journal of biological sciences.

[29]  Juan Mei,et al.  Analysis and prediction of presynaptic and postsynaptic neurotoxins by Chou's general pseudo amino acid composition and motif features. , 2018, Journal of theoretical biology.

[30]  Kuo-Chen Chou,et al.  iPTM-mLys: identifying multiple lysine PTM sites and their different types , 2016, Bioinform..

[31]  K. Chou,et al.  iDNA6mA-PseKNC: Identifying DNA N6-methyladenosine sites by incorporating nucleotide physicochemical properties into PseKNC. , 2018, Genomics.

[32]  Sher Afzal Khan,et al.  A Prediction Model for Membrane Proteins Using Moments Based Features , 2016, BioMed research international.

[33]  Kuo-Chen Chou,et al.  pLoc-mPlant: predict subcellular localization of multi-location plant proteins by incorporating the optimal GO information into general PseAAC. , 2017, Molecular bioSystems.

[34]  Ren Long,et al.  iRSpot-EL: identify recombination spots with an ensemble learning approach , 2017, Bioinform..

[35]  De-Shuang Huang,et al.  iEnhancer‐EL: identifying enhancers and their strength with ensemble learning approach , 2018, Bioinform..

[36]  Sun-Yuan Kung,et al.  Mem-ADSVM: A two-layer multi-label predictor for identifying multi-functional types of membrane proteins. , 2016, Journal of theoretical biology.

[37]  Lei Chen,et al.  Prediction of Multi-Type Membrane Proteins in Human by an Integrated Approach , 2014, PloS one.

[38]  K. Chou,et al.  Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location* , 2002, The Journal of Biological Chemistry.

[39]  Jiangning Song,et al.  Quokka: a comprehensive tool for rapid and accurate prediction of kinase family‐specific phosphorylation sites in the human proteome , 2018, Bioinform..

[40]  K Nishikawa,et al.  The folding type of a protein is relevant to the amino acid composition. , 1986, Journal of biochemistry.

[41]  Gholamreza Haffari,et al.  PROSPERous: high-throughput prediction of substrate cleavage sites for 90 proteases with improved accuracy , 2018, Bioinform..

[42]  Meng Wang,et al.  Using Fourier Spectrum Analysis and Pseudo Amino Acid Composition for Prediction of Membrane Protein Types , 2005, The protein journal.

[43]  K. Chou,et al.  iRNA-3typeA: Identifying Three Types of Modification at RNA’s Adenosine Sites , 2018, Molecular therapy. Nucleic acids.

[44]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[45]  Kuo-Chen Chou,et al.  2L-piRNA: A Two-Layer Ensemble Classifier for Identifying Piwi-Interacting RNAs and Their Function , 2017, Molecular therapy. Nucleic acids.

[46]  K. Chou,et al.  Support vector machines for predicting membrane protein types by using functional domain composition. , 2003, Biophysical journal.

[47]  Hao Lin The modified Mahalanobis Discriminant for predicting outer membrane proteins by using Chou's pseudo amino acid composition. , 2008, Journal of theoretical biology.

[48]  Jingqi Yuan,et al.  A Multilabel Model Based on Chou’s Pseudo–Amino Acid Composition for Identifying Membrane Proteins with Both Single and Multiple Functional Types , 2013, The Journal of Membrane Biology.

[49]  Kuo-Chen Chou,et al.  pLoc‐mAnimal: predict subcellular localization of animal proteins with both single and multiple sites , 2017, Bioinform..

[50]  Fan Yang,et al.  iPromoter-2L: a two-layer predictor for identifying promoters and their types by multi-window-based PseKNC , 2018, Bioinform..

[51]  Zahoor Jan,et al.  iMem-2LSAAC: A two-level model for discrimination of membrane proteins and their types by extending the notion of SAAC into chou's pseudo amino acid composition. , 2018, Journal of theoretical biology.

[52]  Yilong Hao,et al.  The performance of the backpropagation algorithm with varying slope of the activation function , 2009 .

[53]  Geoffrey I. Webb,et al.  POSSUM: a bioinformatics toolkit for generating numerical sequence feature descriptors based on PSSM profiles , 2017, Bioinform..

[54]  Kuo-Chen Chou,et al.  pLoc-mGpos: Incorporate Key Gene Ontology Information into General PseAAC for Predicting Subcellular Localization of Gram-Positive Bacterial Proteins , 2017 .

[55]  Kuo-Chen Chou,et al.  pLoc-mGneg: Predict subcellular localization of Gram-negative bacterial proteins by deep gene ontology learning via general PseAAC. , 2017, Genomics.

[56]  Geoffrey I. Webb,et al.  iProt-Sub: a comprehensive package for accurately mapping and predicting protease-specific substrates and cleavage sites , 2018, Briefings Bioinform..

[57]  Predicting the Functional Types of Singleplex and Multiplex Eukaryotic Membrane Proteins via Different Models of Chou’s Pseudo Amino Acid Compositions , 2016, The Journal of Membrane Biology.

[58]  P. Aloy,et al.  Relation between amino acid composition and cellular location of proteins. , 1997, Journal of molecular biology.

[59]  Shengli Zhang,et al.  Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. , 2018, Journal of theoretical biology.

[60]  Jiangning Song,et al.  Bastion6: a bioinformatics approach for accurate prediction of type VI secreted effectors , 2018, Bioinform..

[61]  K. Chou,et al.  iHSP-PseRAAAC: Identifying the heat shock protein families using pseudo reduced amino acid alphabet composition. , 2013, Analytical biochemistry.

[62]  Bin Liu,et al.  Pse-in-One 2.0: An Improved Package of Web Servers for Generating Various Modes of Pseudo Components of DNA, RNA, and Protein Sequences , 2017 .

[63]  K. Chou,et al.  iRNA-PseColl: Identifying the Occurrence Sites of Different RNA Modifications by Incorporating Collective Effects of Nucleotides into PseKNC , 2017, Molecular therapy. Nucleic acids.

[64]  Wei Chen,et al.  iRNA-PseU: Identifying RNA pseudouridine sites , 2016, Molecular therapy. Nucleic acids.

[65]  Maqsood Hayat,et al.  Discriminating outer membrane proteins with Fuzzy K-nearest Neighbor algorithms based on the general form of Chou's PseAAC. , 2012, Protein and peptide letters.

[66]  K. Chou,et al.  iPGK-PseAAC: Identify Lysine Phosphoglycerylation Sites in Proteins by Incorporating Four Different Tiers of Amino Acid Pairwise Coupling Information into the General PseAAC. , 2017, Medicinal chemistry (Shariqah (United Arab Emirates)).

[67]  K. Chou Prediction of signal peptides using scaled window , 2001, Peptides.

[68]  B. Liu,et al.  Identification of Real MicroRNA Precursors with a Pseudo Structure Status Composition Approach , 2015, PloS one.

[69]  K. Chou Prediction of protein cellular attributes using pseudo‐amino acid composition , 2001, Proteins.

[70]  Kuo-Chen Chou,et al.  Using amphiphilic pseudo amino acid composition to predict enzyme subfamily classes , 2005, Bioinform..

[71]  S. Muthu Krishnan,et al.  Using Chou's general PseAAC to analyze the evolutionary relationship of receptor associated proteins (RAP) with various folding patterns of protein domains. , 2018 .

[72]  Gholamreza Haffari,et al.  PREvaIL, an integrative approach for inferring catalytic residues using sequence, structural, and network features in a machine-learning framework. , 2018, Journal of theoretical biology.

[73]  K. Chou,et al.  iNitro-Tyr: Prediction of Nitrotyrosine Sites in Proteins with General Pseudo Amino Acid Composition , 2014, PloS one.

[74]  Ren Long,et al.  iEnhancer-2L: a two-layer predictor for identifying enhancers and their strength by pseudo k-tuple nucleotide composition , 2016, Bioinform..

[75]  Zu-Guo Yu,et al.  A two-stage SVM method to predict membrane protein types by incorporating amino acid classifications and physicochemical properties into a general form of Chou's PseAAC. , 2014, Journal of theoretical biology.

[76]  Patrice Dosset,et al.  Automatic detection of diffusion modes within biological membranes using back-propagation neural network , 2016, BMC Bioinformatics.

[77]  Geoffrey I. Webb,et al.  iFeature: a Python package and web server for features extraction and selection from protein and peptide sequences , 2018, Bioinform..

[78]  K. Chou,et al.  Pseudo nucleotide composition or PseKNC: an effective formulation for analyzing genomic sequences. , 2015, Molecular bioSystems.

[79]  Wei Chen,et al.  iRNA-AI: identifying the adenosine to inosine editing sites in RNA sequences , 2016, Oncotarget.

[80]  Kuo-Chen Chou,et al.  pSuc-Lys: Predict lysine succinylation sites in proteins with PseAAC and ensemble random forest approach. , 2016, Journal of theoretical biology.

[81]  K. Chou,et al.  REVIEW : Recent advances in developing web-servers for predicting protein attributes , 2009 .

[82]  M. Wang,et al.  Low-frequency Fourier spectrum for predicting membrane protein types. , 2005, Biochemical and biophysical research communications.

[83]  Dong Wang,et al.  iLoc‐lncRNA: predict the subcellular location of lncRNAs by incorporating octamer composition into general PseKNC , 2018, Bioinform..

[84]  Kuo-Chen Chou,et al.  pLoc_bal-mGpos: Predict subcellular localization of Gram-positive bacterial proteins by quasi-balancing training dataset and PseAAC. , 2019, Genomics.

[85]  Kuo-Chen Chou,et al.  Using optimized evidence-theoretic K-nearest neighbor classifier and pseudo-amino acid composition to predict membrane protein types. , 2005, Biochemical and biophysical research communications.

[86]  K. Chou,et al.  pLoc-mEuk: Predict subcellular localization of multi-label eukaryotic proteins by extracting the key GO information into general PseAAC. , 2018, Genomics.

[87]  Yaser Daanial Khan,et al.  Prediction of N-linked glycosylation sites using position relative features and statistical moments , 2017, PloS one.

[88]  Mohammed Yeasin,et al.  Prediction of membrane proteins using split amino acid and ensemble classification , 2011, Amino Acids.

[89]  Ren Long,et al.  iDHS-EL: identifying DNase I hypersensitive sites by fusing three different modes of pseudo nucleotide composition into an ensemble learning framework , 2016, Bioinform..

[90]  Kuo-Chen Chou,et al.  A Novel Modeling in Mathematical Biology for Classification of Signal Peptides , 2018, Scientific Reports.