Exploiting heterogeneous features to improve in silico prediction of peptide status – amyloidogenic or non-amyloidogenic

BackgroundPrediction of short stretches in protein sequences capable of forming amyloid-like fibrils is important in understanding the underlying cause of amyloid illnesses thereby aiding in the discovery of sequence-targeted anti-aggregation pharmaceuticals. Due to the constraints of experimental molecular techniques in identifying such motif segments, it is highly desirable to develop computational methods to provide better and affordable in silico predictions.ResultsAccurate in silico prediction techniques of amyloidogenic peptide regions rely on the cooperation between informative features and classifier design. In this research article, we propose one such efficient fibril prediction implementation exploiting heterogeneous features based on bio-physio-chemical (BPC) properties, auto-correlation function of carefully selected amino acid indices and atomic composition within a protein fragment of amino acids in a window. In an attempt to get an optimal number of BPC features, an evolutionary Support Vector Machine (SVM) integrating a novel implementation of hybrid Genetic Algorithm termed Memetic Algorithm and SVM is utilized. Five prediction modules designed using Artificial Neural Network (ANN) models are trained with independent and integrated features in order to validate the fibril forming motifs. The results provide evidence that incorporating new feature namely auto-correlation function besides BPC, attempt to strengthen the sequence interaction effect in forming the feature vector thereby obtaining better prediction quality in terms of sensitivity, specificity, Mathews Correlation Coefficient and Area under the Receiver Operating Characteristics curve.ConclusionA significant improvement in performance is observed by introducing features like auto-correlation function that maintains sequence order effect, in addition to the conventional BPC properties selected through a novel optimization strategy to predict the peptide status – amyloidogenic or non-amyloidogenic. The proposed approach achieves acceptable results, comparable to most online predictors. Besides, it compensates the lacuna in existing amyloid fibril prediction tools by maintaining equilibrium between sensitivity and specificity.

[1]  Nv Subbareddy,et al.  Assessing the Accuracy of Computational Tools for thePrediction of Amyloid Fibril forming Motifs: An Overview , 2011 .

[2]  Michail Yu. Lobanov,et al.  Prediction of Amyloidogenic and Disordered Regions in Protein Chains , 2006, PLoS Comput. Biol..

[3]  Jun Guo,et al.  Prediction of amyloid fibril-forming segments based on a support vector machine , 2009, BMC Bioinformatics.

[4]  Pablo Moscato,et al.  Memetic algorithms: a short introduction , 1999 .

[5]  L. Jiang,et al.  PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[6]  D. Baker,et al.  The 3D profile method for identifying fibril-forming segments of proteins. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[7]  Ilya Levner,et al.  Feature selection and nearest centroid classification for protein mass spectrometry , 2005, BMC Bioinformatics.

[8]  R. Suganya,et al.  Data Mining Concepts and Techniques , 2010 .

[9]  Shiow-Fen Hwang,et al.  ProLoc: Prediction of protein subnuclear localization using SVM with automatic selection from physicochemical composition features , 2007, Biosyst..

[10]  Louise C. Serpell,et al.  Insights into the Structure of Amyloid Fibrils~!2009-04-21~!2009-07-09~!2010-01-02~! , 2010 .

[11]  N. V. Subba Reddy,et al.  An Evaluation of Feature Selection Approaches in Finding Amyloidogenic Regions in Protein Sequences , 2010 .

[12]  Yonghong Peng,et al.  A novel feature selection approach for biomedical data classification , 2010, J. Biomed. Informatics.

[13]  Pedro Larrañaga,et al.  A review of feature selection techniques in bioinformatics , 2007, Bioinform..

[14]  Zexuan Zhu,et al.  Wrapper–Filter Feature Selection Algorithm Using a Memetic Framework , 2007, IEEE Transactions on Systems, Man, and Cybernetics, Part B (Cybernetics).

[15]  David Eisenberg,et al.  Identifying the amylome, proteins capable of forming amyloid-like fibrils , 2010, Proceedings of the National Academy of Sciences.

[16]  Z. R. Li,et al.  Update of PROFEAT: a web server for computing structural and physicochemical features of proteins and peptides from amino acid sequence , 2006, Nucleic Acids Res..

[17]  Louise C. Serpell,et al.  Insights into the Structure of Amyloid Fibrils , 2009 .

[18]  A. E. Eiben,et al.  Introduction to Evolutionary Computing , 2003, Natural Computing Series.

[19]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[20]  Ying Xu,et al.  Computational prediction of human proteins that can be secreted into the bloodstream , 2008, Bioinform..

[21]  Xiuzhen Zhang,et al.  Predicting disordered regions in proteins using the profiles of amino acid indices , 2009, BMC Bioinformatics.

[22]  Deepak Kolippakkam,et al.  APDbase: Amino acid Physicochemical properties Database , 2005, Bioinformation.

[23]  Sohail Asghar,et al.  A REVIEW OF FEATURE SELECTION TECHNIQUES IN STRUCTURE LEARNING , 2013 .

[24]  Francesc X. Avilés,et al.  AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides , 2007, BMC Bioinform..

[25]  Sanghamitra Bandyopadhyay,et al.  TargetMiner: microRNA target prediction with systematic identification of tissue-specific negative examples , 2009, Bioinform..

[26]  Jian Pei,et al.  Data Mining: Concepts and Techniques, 3rd edition , 2006 .

[27]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[28]  Hao Chen,et al.  Identification of amyloid fibril-forming segments based on structure and residue-based statistical potential , 2007, Bioinform..

[29]  Amedeo Caflisch,et al.  Computational models for the prediction of polypeptide aggregation propensity. , 2006, Current opinion in chemical biology.

[30]  Michail Yu. Lobanov,et al.  FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence , 2010, Bioinform..

[31]  A. Giuliani,et al.  A computational approach identifies two regions of Hepatitis C Virus E1 protein as interacting domains involved in viral fusion process , 2009, BMC Structural Biology.