Exemplar-Based Learning to Predict Protein Folding

Abstract A new machine learning technique is presented for predicting protein secondary structure from primary sequence data, a task that is a valuable step towards understanding protein folding. The technique involves storing large numbers of points in a multi-dimensional space, and using an extensively modified nearest neighbor method to make predictions. The learning program was trained on a set of 101 proteins of known structure, and tested on a separate set of 28 additional proteins. The maximum overall predictive accuracy was 71.0%, which surpasses recent tests using neural nets, as well as other, more traditional methods. We further observed that some sequences of residues were considerably easier to classify than others.