Using Domain Knowledge in ILP to Discover Protein Functional Models

The paper describes a method for machine discovery of protein functional models from protein databases using Inductive Logic Programming (ILP). The method uses domain knowledge in ILP to generate appropriate hypotheses to predict functions of a protein from its amino acid sequence. The method is based on top-down search for relative least general generalization and uses domain knowledge defining the conceptual hierarchy of protein functions and search biases. The method discovers effectively protein function models that explain the relationship between functions of proteins and their amino acid sequences described in protein databases. The method succeeds in discovering protein functional models for forty membrane proteins, which coincide with conjectured models in literature of molecular biology.