Prediction of amyloid fibrillar aggregates of polypeptide sequences: A soft computing approach

The deposition of amyloid fibrillar aggregates in human brain results in amyloid illnesses. As these aggregates may spread like virus, it is of primary importance to spot such motif regions in protein sequences. Limitations of molecular techniques in identifying them offer sophisticated computational methods for their efficient retrieval. In this paper we tried to enhance the prediction performance of computational approaches by the union of machine learning algorithms: an approach from a soft computing perspective. A filter based dimensionality reduction algorithm has been utilized on the extracted features to obtain a minimal feature subset for Decision tree classification. The filter approach is a multivariate statistical analysis based on the mutual information which is a mixed measure of maximum Relevance and Minimum Redundancy of features. We performed stratified 10-fold cross-validation test to objectively evaluate the accuracy of the predictor.

[1]  David Eisenberg,et al.  Identifying the amylome, proteins capable of forming amyloid-like fibrils , 2010, Proceedings of the National Academy of Sciences.

[2]  Fuhui Long,et al.  Feature selection based on mutual information criteria of max-dependency, max-relevance, and min-redundancy , 2003, IEEE Transactions on Pattern Analysis and Machine Intelligence.

[3]  Kenneth H. Stokoe,et al.  Proceedings of the World Congress on Engineering 2013, WCE 2013 , 2013 .

[4]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[5]  Michail Yu. Lobanov,et al.  FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence , 2010, Bioinform..

[6]  Ron Kohavi,et al.  Wrappers for Feature Subset Selection , 1997, Artif. Intell..

[7]  GNNQQNY--investigation of early steps during amyloid formation. , 2010, Biophysical journal.

[8]  Motif mining: an assessment and perspective for amyloid fibril prediction tool , 2012, Bioinformation.

[9]  N. V. Subba Reddy,et al.  Exploiting heterogeneous features to improve in silico prediction of peptide status – amyloidogenic or non-amyloidogenic , 2011, BMC Bioinformatics.

[10]  Jun Guo,et al.  Prediction of amyloid fibril-forming segments based on a support vector machine , 2009, BMC Bioinformatics.

[11]  Pierre Baldi,et al.  Assessing the accuracy of prediction algorithms for classification: an overview , 2000, Bioinform..

[12]  Hiroyuki Ogata,et al.  AAindex: Amino Acid Index Database , 1999, Nucleic Acids Res..

[13]  A. Esteras-Chopo,et al.  Design of model systems for amyloid formation: lessons for prediction and inhibition. , 2005, Current opinion in structural biology.

[14]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[15]  D. Baker,et al.  The 3D profile method for identifying fibril-forming segments of proteins. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[16]  Anju Vyas Print , 2003 .

[17]  Stavros J. Hamodrakas,et al.  A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins , 2013, PloS one.