Computational Prediction of Amyloidogenic Regions in Protiens : A Machine Learning Approach

Amyloidogenic regions in polypeptide chains are associated with a number of pathologies including neurodegenerative diseases. Recent studies have shown that small regions of proteins are responsible for its amyloidogenic behavior. Therefore, identifying these short peptides is critical for understanding diseases associated with protein aggregation. Owing to the limitations of molecular techniques for the identification of fibril forming targets, it became apparent that clever computational techniques might enable their discovery in silico. We propose a machine learning based method to predict the amyloid fibril-forming short stretches of peptides using Support Vector Machine. The features of this method are based on the physicochemical properties of amino acids. Inorder to get an optimal number of properties, a feature selection approach based on Genetic Algorithm is PErformed. The presented algorithm achieved a balanced prediction performance in terms of true positive and false positive rates in predicting a peptide status: amyloidogenic or non-amyloidogenic, which is not reflected in the existing methods.

[1]  Francesc X. Avilés,et al.  AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides , 2007, BMC Bioinform..

[2]  Michail Yu. Lobanov,et al.  Prediction of Amyloidogenic and Disordered Regions in Protein Chains , 2006, PLoS Comput. Biol..

[3]  Lucila Ohno-Machado,et al.  The use of receiver operating characteristic curves in biomedical informatics , 2005, J. Biomed. Informatics.

[4]  L. Serrano,et al.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins , 2004, Nature Biotechnology.

[5]  Hao Chen,et al.  Identification of amyloid fibril-forming segments based on structure and residue-based statistical potential , 2007, Bioinform..

[6]  Deepak Kolippakkam,et al.  APDbase: Amino acid Physicochemical properties Database , 2005, Bioinformation.

[7]  Salvador Ventura,et al.  Prediction of "hot spots" of aggregation in disease-linked polypeptides , 2005, BMC Structural Biology.

[8]  Susan Idicula-Thomas,et al.  Understanding the relationship between the primary structure of proteins and their amyloidogenic propensity: clues from inclusion body formation. , 2005, Protein engineering, design & selection : PEDS.

[9]  Amedeo Caflisch,et al.  Computational models for the prediction of polypeptide aggregation propensity. , 2006, Current opinion in chemical biology.

[10]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[11]  David Eisenberg,et al.  A systematic screen of β2-microglobulin and insulin for amyloid-like segments , 2006 .

[12]  P. Pudil,et al.  of Techniques for Large-Scale Feature Selection , 1994 .

[13]  Mineichi Kudo,et al.  Comparison of algorithms that select features for pattern classifiers , 2000, Pattern Recognit..

[14]  A. Giuliani,et al.  A computational approach identifies two regions of Hepatitis C Virus E1 protein as interacting domains involved in viral fusion process , 2009, BMC Structural Biology.

[15]  LarrañagaPedro,et al.  A review of feature selection techniques in bioinformatics , 2007 .

[16]  Minoru Kanehisa,et al.  AAindex: Amino Acid index database , 2000, Nucleic Acids Res..

[17]  Michele Vendruscolo,et al.  Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases. , 2005, Journal of molecular biology.

[18]  Jun Guo,et al.  Prediction of amyloid fibril-forming segments based on a support vector machine , 2009, BMC Bioinformatics.