Unsupervised Integration of Multiple Protein Disorder Predictors: The Method and Evaluation on CASP7, CASP8 and CASP9 Data

BackgroundStudies of intrinsically disordered proteins that lack a stable tertiary structure but still have important biological functions critically rely on computational methods that predict this property based on sequence information. Although a number of fairly successful models for prediction of protein disorder have been developed over the last decade, the quality of their predictions is limited by available cases of confirmed disorders.ResultsTo more reliably estimate protein disorder from protein sequences, an iterative algorithm is proposed that integrates predictions of multiple disorder models without relying on any protein sequences with confirmed disorder annotation. The iterative method alternately provides the maximum a posterior (MAP) estimation of disorder prediction and the maximum-likelihood (ML) estimation of quality of multiple disorder predictors. Experiments on data used at CASP7, CASP8, and CASP9 have shown the effectiveness of the proposed algorithm.ConclusionsThe proposed algorithm can potentially be used to predict protein disorder and provide helpful suggestions on choosing suitable disorder predictors for unknown protein sequences.

[1]  Gerardo Hermosillo,et al.  Supervised learning from multiple experts: whom to trust when everyone lies a bit , 2009, ICML '09.

[2]  Jaime Prilusky,et al.  Assessment of disorder predictions in CASP8 , 2009, Proteins.

[3]  Radford M. Neal Pattern Recognition and Machine Learning , 2007, Technometrics.

[4]  Zsuzsanna Dosztányi,et al.  IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content , 2005, Bioinform..

[5]  Xin Deng,et al.  PreDisorder: ab initio sequence-based prediction of protein disordered regions , 2009, BMC Bioinformatics.

[6]  Lixiao Wang,et al.  OnD-CRF: prediciting order and disorder in proteins conditional random fields , 2008, Bioinform..

[7]  Kengo Kinoshita,et al.  Prediction of disordered regions in proteins based on the meta approach , 2008, Bioinform..

[8]  Aarti Garg,et al.  DPROT: prediction of disordered proteins using evolutionary information , 2008, Amino Acids.

[9]  G J Williams,et al.  The Protein Data Bank: a computer-based archival file for macromolecular structures. , 1978, Archives of biochemistry and biophysics.

[10]  Lukasz A. Kurgan,et al.  In-silico prediction of disorder content using hybrid sequence representation , 2011, BMC Bioinformatics.

[11]  Bernard F. Buxton,et al.  The DISOPRED server for the prediction of protein disorder , 2004, Bioinform..

[12]  Roland L. Dunbrack,et al.  PONDR-FIT: a meta-predictor of intrinsically disordered amino acids. , 2010, Biochimica et biophysica acta.

[13]  Shuichi Hirose,et al.  POODLE-I: Disordered Region Prediction by Integrating POODLE Series and Structural Information Predictors Based on a Workflow Approach , 2010, Silico Biol..

[14]  Liam J. McGuffin,et al.  Intrinsic disorder prediction from the analysis of multiple protein fold recognition models , 2008, Bioinform..

[15]  Roland L. Dunbrack,et al.  Assessment of disorder predictions in CASP6 , 2005, Proteins.

[16]  Zoran Obradovic,et al.  DisProt: the Database of Disordered Proteins , 2006, Nucleic Acids Res..

[17]  Lukasz A. Kurgan,et al.  Improved sequence-based prediction of disordered regions with multilayer fusion of multiple information sources , 2010, Bioinform..

[18]  Zoran Obradovic,et al.  Predicting intrinsic disorder from amino acid sequence , 2003, Proteins.

[19]  Lixiao Wang,et al.  OnD-CRF: predicting order and disorder in proteins conditional random fields , 2008, Bioinform..

[20]  Christopher J. Oldfield,et al.  Functional anthology of intrinsic disorder. 1. Biological processes and functions of proteins with long disordered regions. , 2007, Journal of proteome research.

[21]  Zoran Obradovic,et al.  Optimizing Long Intrinsic Disorder Predictors with Protein Evolutionary Information , 2005, J. Bioinform. Comput. Biol..

[22]  A. Dunker,et al.  Predicting intrinsic disorder in proteins: an overview , 2009, Cell Research.

[23]  Torsten Schwede,et al.  Assessment of disorder predictions in CASP7 , 2007, Proteins.

[24]  Avner Schlessinger,et al.  Improved Disorder Prediction by Combination of Orthogonal Approaches , 2009, PloS one.

[25]  A Keith Dunker,et al.  Protein disorder in the human diseasome: unfoldomics of human genetic diseases , 2009, BMC Genomics.

[26]  Silvio C. E. Tosatto,et al.  CSpritz: accurate prediction of protein disorder segments with annotation for homology, secondary structure and linear motifs , 2011, Nucleic Acids Res..

[27]  A.K. Dunker,et al.  Identifying disordered regions in proteins from amino acid sequence , 1997, Proceedings of International Conference on Neural Networks (ICNN'97).