Identification of amyloidogenic peptides via optimized integrated features space based on physicochemical properties and PSSM.

At present, the identification of amyloid becomes more and more essential and meaningful. Because its mis-aggregation may cause some diseases such as Alzheimer's and Parkinson's diseases. This paper focus on the classification of amyloidogenic peptides and a novel feature representation called PhyAve_PSSMDwt is proposed. It includes two parts. One is based on physicochemical properties involving hydrophilicity, hydrophobicity, aggregation tendency, packing density and H-bonding which extracts 15-dimensional features in total. And the other is 60-dimensional features through recursive feature elimination from PSSM by discrete wavelet transform. In this period, sliding window is introduced to reconstruct PSSM so that the evolutionary information of short sequences can still be extracted. At last, the support vector machine is adopted as a classifier. The experimental result on Pep424 dataset shows that PSSM's information makes a great contribution on performance. And compared with other existing methods, our results after cross-validation increase by 3.1%, 3.3%, 0.136 and 0.007 in accuracy, specificity, Matthew's correlation coefficient and AUC value, respectively. It indicates that our method is effective and competitive.

[1]  Christopher J Roberts,et al.  Driving Forces for Nonnative Protein Aggregation and Approaches to Predict Aggregation-Prone Regions. , 2017, Annual review of chemical and biomolecular engineering.

[2]  Christopher M. Dobson,et al.  Amyloid fibrils from muscle myoglobin , 2001, Nature.

[3]  Silvio C. E. Tosatto,et al.  PASTA 2.0: an improved server for protein aggregation prediction , 2014, Nucleic Acids Res..

[4]  Jason Weston,et al.  Gene Selection for Cancer Classification using Support Vector Machines , 2002, Machine Learning.

[5]  L. Serrano,et al.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins , 2004, Nature Biotechnology.

[6]  M. Fändrich,et al.  FTIR reveals structural differences between native β‐sheet proteins and amyloid fibrils , 2004, Protein science : a publication of the Protein Society.

[7]  Feng Huang,et al.  Predicting drug-disease associations and their therapeutic function based on the drug-disease association bipartite network. , 2018, Methods.

[8]  Christopher M. Dobson,et al.  Alzheimer’s disease: addressing a twenty-first century plague , 2015, Rendiconti Lincei.

[9]  D. Baker,et al.  The 3D profile method for identifying fibril-forming segments of proteins. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[10]  Ying Ju,et al.  Pretata: predicting TATA binding proteins with novel features and dimensionality reduction strategy , 2016, BMC Systems Biology.

[11]  Michail Yu. Lobanov,et al.  FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence , 2010, Bioinform..

[12]  M. Vendruscolo,et al.  The Zyggregator method for predicting protein aggregation propensities. , 2008, Chemical Society reviews.

[13]  Shao-Ping Shi,et al.  Identifying protein quaternary structural attributes by incorporating physicochemical properties into the general form of Chou's PseAAC via discrete wavelet transform. , 2012, Molecular bioSystems.

[14]  Michele Vendruscolo,et al.  The CamSol method of rational design of protein mutants with enhanced solubility. , 2015, Journal of molecular biology.

[15]  M. R. Nilsson Techniques to study amyloid fibril formation in vitro. , 2004, Methods.

[16]  K. Chou Some remarks on protein attribute prediction and pseudo amino acid composition , 2010, Journal of Theoretical Biology.

[17]  Lusheng Wang,et al.  Protein-protein interface prediction based on hexagon structure similarity , 2016, Comput. Biol. Chem..

[18]  Ronald Wetzel,et al.  A serendipitous survey of prediction algorithms for amyloidogenicity. , 2013, Biopolymers.

[19]  Weigang Qiu,et al.  Candida albicans Als Adhesins Have Conserved Amyloid-Forming Sequences , 2007, Eukaryotic Cell.

[20]  Michele Vendruscolo,et al.  Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases. , 2005, Journal of molecular biology.

[21]  Francesc X. Avilés,et al.  AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides , 2007, BMC Bioinform..

[22]  Maria Pamela C. David,et al.  Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies , 2010, BMC Bioinformatics.

[23]  Minoru Kanehisa,et al.  AAindex: amino acid index database, progress report 2008 , 2007, Nucleic Acids Res..

[24]  Shengli Zhang,et al.  Prediction of protein subcellular localization with oversampling approach and Chou's general PseAAC. , 2018, Journal of theoretical biology.

[25]  Jijun Tang,et al.  Improved detection of DNA-binding proteins via compression technology on PSSM information , 2017, PloS one.

[26]  Feng Ye,et al.  Using principal component analysis and support vector machine to predict protein structural class for low-similarity sequences via PSSM , 2012, Journal of biomolecular structure & dynamics.

[27]  Jun Guo,et al.  Prediction of amyloid fibril-forming segments based on a support vector machine , 2009, BMC Bioinformatics.

[28]  Sanyang Liu,et al.  Geary autocorrelation and DCCA coefficient: Application to predict apoptosis protein subcellular localization via PSSM , 2017 .

[29]  David A. Phoenix,et al.  Prediction of Peptide and Protein Propensity for Amyloid Formation , 2014, PloS one.

[30]  M. Oliveberg Waltz, an exciting new move in amyloid prediction , 2010, Nature Methods.

[31]  Xiaohui Lin,et al.  A support vector machine-recursive feature elimination feature selection method based on artificial contrast variables and mutual information. , 2012, Journal of chromatography. B, Analytical technologies in the biomedical and life sciences.

[32]  Hao Chen,et al.  Identification of amyloid fibril-forming segments based on structure and residue-based statistical potential , 2007, Bioinform..

[33]  Louise C. Serpell,et al.  A simple algorithm locates β‐strands in the amyloid fibril core of α‐synuclein, Aβ, and tau using the amino acid sequence alone , 2007 .

[34]  Zaheer Ullah Khan,et al.  DBPPred-PDSD: Machine learning approach for prediction of DNA-binding proteins using Discrete Wavelet Transform and optimized integrated features space , 2018, Chemometrics and Intelligent Laboratory Systems.

[35]  M. Jafarian,et al.  Application of wavelet entropy in analysis of electrochemical noise for corrosion type identification , 2014 .

[36]  W. Klunk,et al.  Development of positron emission tomography β-amyloid plaque imaging agents. , 2012, Seminars in nuclear medicine.

[37]  C. Blake,et al.  The structure of amyloid fibrils by electron microscopy and X-ray diffraction. , 1997, Advances in protein chemistry.

[38]  Christopher M Dobson,et al.  The Amyloid Phenomenon and Its Links with Human Disease. , 2017, Cold Spring Harbor perspectives in biology.

[39]  Thomas L. Madden,et al.  Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. , 1997, Nucleic acids research.

[40]  Corinna Cortes,et al.  Support-Vector Networks , 1995, Machine Learning.

[41]  Zijiang Yang,et al.  Prediction of DNA-binding proteins by interaction fusion feature representation and selective ensemble , 2019, Knowl. Based Syst..