MetAmyl: A METa-Predictor for AMYLoid Proteins

The aggregation of proteins or peptides in amyloid fibrils is associated with a number of clinical disorders, including Alzheimer's, Huntington's and prion diseases, medullary thyroid cancer, renal and cardiac amyloidosis. Despite extensive studies, the molecular mechanisms underlying the initiation of fibril formation remain largely unknown. Several lines of evidence revealed that short amino-acid segments (hot spots), located in amyloid precursor proteins act as seeds for fibril elongation. Therefore, hot spots are potential targets for diagnostic/therapeutic applications, and a current challenge in bioinformatics is the development of methods to accurately predict hot spots from protein sequences. In this paper, we combined existing methods into a meta-predictor for hot spots prediction, called MetAmyl for METapredictor for AMYLoid proteins. MetAmyl is based on a logistic regression model that aims at weighting predictions from a set of popular algorithms, statistically selected as being the most informative and complementary predictors. We evaluated the performances of MetAmyl through a large scale comparative study based on three independent datasets and thus demonstrated its ability to differentiate between amyloidogenic and non-amyloidogenic polypeptides. Compared to 9 other methods, MetAmyl provides significant improvement in prediction on studied datasets. We further show that MetAmyl is efficient to highlight the effect of point mutations involved in human amyloidosis, so we suggest this program should be a useful complementary tool for the diagnosis of these diseases.

[1]  J. Miguet,et al.  Successful Hepatorenal Transplantation in Hereditary Amyloidosis Caused by a Frame‐Shift Mutation in Fibrinogen Aα‐Chain Gene , 2006, American journal of transplantation : official journal of the American Society of Transplantation and the American Society of Transplant Surgeons.

[2]  G. Hong,et al.  Nucleic Acids Research , 2015, Nucleic Acids Research.

[3]  C. Zeng,et al.  Diagnosis, pathogenesis, treatment, and prognosis of hereditary fibrinogen A alpha-chain amyloidosis. , 2009, Journal of the American Society of Nephrology : JASN.

[4]  Matteo Ramazzotti,et al.  Prediction of amyloid aggregation in vivo , 2011, EMBO reports.

[5]  J. Liepnieks,et al.  Fibrinogen A alpha chain Leu 554: an African-American kindred with late onset renal amyloidosis. , 1998, Amyloid : the international journal of experimental and clinical investigation : the official journal of the International Society of Amyloidosis.

[6]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[7]  Robert Tibshirani,et al.  The Elements of Statistical Learning: Data Mining, Inference, and Prediction, 2nd Edition , 2001, Springer Series in Statistics.

[8]  Shinn-Ying Ho,et al.  Prediction and Analysis of Antibody Amyloidogenesis from Sequences , 2013, PloS one.

[9]  Francesc X. Avilés,et al.  AGGRESCAN: a server for the prediction and evaluation of "hot spots" of aggregation in polypeptides , 2007, BMC Bioinform..

[10]  David Eisenberg,et al.  Short protein segments can drive a non-fibrillizing protein into the amyloid state. , 2009, Protein engineering, design & selection : PEDS.

[11]  Louise C. Serpell,et al.  Insights into the Structure of Amyloid Fibrils~!2009-04-21~!2009-07-09~!2010-01-02~! , 2010 .

[12]  Salvador Ventura,et al.  Prediction of "hot spots" of aggregation in disease-linked polypeptides , 2005, BMC Structural Biology.

[13]  Srinivas Devadas,et al.  A method for probing the mutational landscape of amyloid structure , 2011, Bioinform..

[14]  Tom Fawcett,et al.  An introduction to ROC analysis , 2006, Pattern Recognit. Lett..

[15]  David Eisenberg,et al.  Recent atomic models of amyloid fibril structure. , 2006, Current opinion in structural biology.

[16]  C. Dobson,et al.  Protein misfolding, functional amyloid, and human disease. , 2006, Annual review of biochemistry.

[17]  D. Ruppert The Elements of Statistical Learning: Data Mining, Inference, and Prediction , 2004 .

[18]  C. Maury The emerging concept of functional amyloid , 2009, Journal of internal medicine.

[19]  Elena Orlova,et al.  Cryo‐electron microscopy structure of an SH3 amyloid fibril and model of the molecular packing , 1999, The EMBO journal.

[20]  Antony Le Béchec,et al.  AMYPdb: A database dedicated to amyloid precursor proteins , 2008, BMC Bioinformatics.

[21]  P. Meltzer,et al.  Telomere capture stabilizes chromosome breakage , 1993, Nature Genetics.

[22]  Stavros J. Hamodrakas,et al.  A Consensus Method for the Prediction of ‘Aggregation-Prone’ Peptides in Globular Proteins , 2013, PloS one.

[23]  Louise C. Serpell,et al.  Insights into the Structure of Amyloid Fibrils , 2009 .

[24]  Trevor Hastie,et al.  The Elements of Statistical Learning , 2001 .

[25]  David Eisenberg,et al.  Identifying the amylome, proteins capable of forming amyloid-like fibrils , 2010, Proceedings of the National Academy of Sciences.

[26]  Joan-Emma Shea,et al.  Diversity of kinetic pathways in amyloid fibril formation. , 2009, The Journal of chemical physics.

[27]  R. R. Hocking The analysis and selection of variables in linear regression , 1976 .

[28]  Louise C. Serpell,et al.  A simple algorithm locates β‐strands in the amyloid fibril core of α‐synuclein, Aβ, and tau using the amino acid sequence alone , 2007 .

[29]  Fabrizio Chiti,et al.  Studies of the aggregation of mutant proteins in vitro provide insights into the genetics of amyloid diseases , 2002, Proceedings of the National Academy of Sciences of the United States of America.

[30]  Salvador Ventura,et al.  Short amino acid stretches can mediate amyloid formation in globular proteins: the Src homology 3 (SH3) case. , 2004, Proceedings of the National Academy of Sciences of the United States of America.

[31]  J. Liepnieks,et al.  Renal amyloidosis with a frame shift mutation in fibrinogen aalpha-chain gene producing a novel amyloid protein. , 1997, Blood.

[32]  Flavio Seno,et al.  Insight into the Structure of Amyloid Fibrils from the Analysis of Globular Proteins , 2006, PLoS Comput. Biol..

[33]  A. Esteras-Chopo,et al.  The amyloid stretch hypothesis: recruiting proteins toward the dark side. , 2005, Proceedings of the National Academy of Sciences of the United States of America.

[34]  Ronald Wetzel,et al.  A serendipitous survey of prediction algorithms for amyloidogenicity. , 2013, Biopolymers.

[35]  Michail Yu. Lobanov,et al.  FoldAmyloid: a method of prediction of amyloidogenic regions from protein sequence , 2010, Bioinform..

[36]  C. Dobson Experimental investigation of protein folding and misfolding. , 2004, Methods.

[37]  A. Atkinson Subset Selection in Regression , 1992 .

[38]  L. Serpell,et al.  Structural analyses of fibrinogen amyloid fibrils , 2007, Amyloid : the international journal of experimental and clinical investigation : the official journal of the International Society of Amyloidosis.

[39]  Virgil L. Woods,et al.  High prevalence of dysfibrinogenemia among patients with chronic thromboembolic pulmonary hypertension. , 2009, Blood.

[40]  Michele Vendruscolo,et al.  Prediction of "aggregation-prone" and "aggregation-susceptible" regions in proteins associated with neurodegenerative diseases. , 2005, Journal of molecular biology.

[41]  Jie Chen,et al.  Fibril-Forming Motifs Are Essential and Sufficient for the Fibrillization of Human Tau , 2012, PloS one.

[42]  P. Hawkins,et al.  Hereditary amyloidosis in early childhood associated with a novel insertion-deletion (indel) in the fibrinogen Aalpha chain gene. , 2005, Kidney international.

[43]  L. Serrano,et al.  Prediction of sequence-dependent and mutational effects on the aggregation of peptides and proteins , 2004, Nature Biotechnology.

[44]  Joaquín Dopazo,et al.  SNPeffect 4.0: on-line prediction of molecular and structural effects of protein-coding variants , 2011, Nucleic Acids Res..

[45]  Jiwon Choi,et al.  NetCSSP: web application for predicting chameleon sequences and amyloid fibril formation , 2009, Nucleic Acids Res..

[46]  Benjamin E. Gross,et al.  The cBio cancer genomics portal: an open platform for exploring multidimensional cancer genomics data. , 2012, Cancer discovery.

[47]  V. Uversky,et al.  Conformational constraints for amyloid fibrillation: the importance of being unfolded. , 2004, Biochimica et biophysica acta.

[48]  X. Salvatella Structural aspects of amyloid formation. , 2013, Progress in molecular biology and translational science.

[49]  C. Ross,et al.  Protein aggregation and neurodegenerative disease , 2004, Nature Medicine.

[50]  Gautier Koscielny,et al.  Ensembl 2012 , 2011, Nucleic Acids Res..

[51]  J. Liepnieks,et al.  A frame shift mutation in the fibrinogen A alpha chain gene in a kindred with renal amyloidosis. , 1996, Blood.

[52]  M. Hanss,et al.  A Database for Human Fibrinogen Variants , 2001, Annals of the New York Academy of Sciences.

[53]  Maria Pamela C. David,et al.  Using simple artificial intelligence methods for predicting amyloidogenesis in antibodies , 2010, BMC Bioinformatics.

[54]  S. Hamodrakas Protein aggregation and amyloid fibril formation prediction software from primary sequence: towards controlling the formation of bacterial inclusion bodies , 2011, The FEBS journal.

[55]  D. Baker,et al.  The 3D profile method for identifying fibril-forming segments of proteins. , 2006, Proceedings of the National Academy of Sciences of the United States of America.

[56]  W. Youden,et al.  Index for rating diagnostic tests , 1950, Cancer.

[57]  William N. Venables,et al.  Modern Applied Statistics with S , 2010 .

[58]  L. Serrano,et al.  Protein aggregation and amyloidosis: confusion of the kinds? , 2006, Current opinion in structural biology.

[59]  C. Mathias,et al.  Hereditary fibrinogen A alpha-chain amyloidosis: phenotypic characterization of a systemic disease and the role of liver transplantation. , 2010, Blood.

[60]  E. DeLong,et al.  Comparing the areas under two or more correlated receiver operating characteristic curves: a nonparametric approach. , 1988, Biometrics.

[61]  Brian D. Ripley,et al.  Modern Applied Statistics with S Fourth edition , 2002 .

[62]  Hao Chen,et al.  Identification of amyloid fibril-forming segments based on structure and residue-based statistical potential , 2007, Bioinform..

[63]  Christopher M. Dobson,et al.  The Non-Core Regions of Human Lysozyme Amyloid Fibrils Influence Cytotoxicity , 2010, Journal of molecular biology.

[64]  Alan J. Miller Sélection of subsets of regression variables , 1984 .

[65]  Jun Guo,et al.  Prediction of amyloid fibril-forming segments based on a support vector machine , 2009, BMC Bioinformatics.

[66]  A. Wolberg,et al.  Influence of cellular and plasma procoagulant activity on the fibrin network. , 2010, Thrombosis research.

[67]  Lenore Cowen,et al.  BETASCAN: Probable β-amyloids Identified by Pairwise Probabilistic Analysis , 2009, PLoS Comput. Biol..

[68]  Xavier Robin,et al.  pROC: an open-source package for R and S+ to analyze and compare ROC curves , 2011, BMC Bioinformatics.

[69]  Rui Jiang,et al.  A random forest approach to the detection of epistatic interactions in case-control studies , 2009, BMC Bioinformatics.

[70]  Michele Vendruscolo,et al.  Prediction of aggregation-prone regions in structured proteins. , 2008, Journal of molecular biology.

[71]  J. Soria,et al.  The relationship between the fibrinogen D domain self-association/cross-linking site (gammaXL) and the fibrinogen Dusart abnormality (Aalpha R554C-albumin): clues to thrombophilia in the "Dusart syndrome". , 1996, The Journal of clinical investigation.

[72]  M. Chapman,et al.  Sequence determinants of bacterial amyloid formation. , 2008, Journal of molecular biology.

[73]  N. Grigorieff,et al.  Structural polymorphism of Alzheimer Aβ and other amyloid fibrils , 2009, Prion.