Use of Inductive Logic Programming to Learn Principles of Protein Structure

Inductive Logic Programming (ILP) has been applied to learn rules which characterise protein folds. Several representations for the background set have been explored and the results have been interpreted in their biological context. In this paper, we present new results obtained with a background set containing information about protein topology. The new rules are more descriptive than the previous ones, i.e. where previous rules represented local motifs, often associated with functional regions, the new rules represent more complete descriptions, often similar to the descriptions found in SCOP. Cross-validation experiments were conducted for the 20 most populated folds. The overall cross-validated accuracy was found to be 75.1 ± 1.6 % for the more limited background knowledge, and 82.1± 1.4 % whith additional information.