Prediction of Boundaries Between Intrinsically Ordered and Disordered Protein Regions

Using proteins with both disordered and ordered regions collected through literature searches and database scanning, we assembled a set of 24-residue long segments centered on their order/disorder boundaries as well as a larger set of non-boundary segments consisting of either order or disorder. We analyzed position-specific amino acid compositions around the order/disorder boundaries and found more than thirty significant (p < 0.05) compositional differences between boundary and non-boundary data. From this analysis, we constructed several logistic regression predictors of order/disorder boundaries using slightly different data modeling approaches. Exact boundary prediction accuracies were estimated to be in the range from 74% to 80% for the different predictors.