Declarative Knowledge Discovery in Industrial Databases

Industry is increasingly overwhelmed by large-volume-data. For example, the pharmaceutical industry generates vast quantities of data both internally as a side-eeect of screening tests and combinato-rial chemistry, as well as externally from sources such as the human genome project. Industry is also becoming predominantly knowledge-driven. Increased understanding not only improves products, but is also central in market assessment and strategic decision making. From a computer science point of view, the knowledge requirements within industry often give higher emphasis to \knowing that" (declarative or descriptive knowledge) rather than \knowing how" (procedural or prescriptive knowledge). Mathematical logic has always been the preferred representation for declarative knowledge and thus knowledge discovery techniques are required which generate logical formulae from data. Inductive Logic Programming (ILP) is such a technique. Logic programs provide a powerful and exible representation for constraints , grammars, plans, equations and temporal relationships. New techniques developed within the 1990s allow general-purpose ILP systems to construct logic programs from a mixture of raw data and encoded domain knowledge. This paper will review the results of the last few years' academic pilot studies involving the application of ILP to problems in the pharmaceutical, telecommunications and 1 Stephen Muggleton has recently accepted an invitation to take up the new Chair of Machine Learning at the University of York. automobile industries. While predictive accuracy is the central performance measure of data analytical techniques which generate procedural knowledge (neural nets, decision trees, etc.), the performance of an ILP system is determined both by accuracy and degree of insight provided. ILP hypotheses can be easily stated in English and often automatically exempliied pictorially. This allows cross-checking with other relevant domain knowledge. In several of the comparative trials presented ILP systems provided signiicant insights where other data analysis techniques do not. The scene appears now to be set for commercially-oriented application of ILP in industry.

[1]  Ivan Bratko,et al.  Applications of inductive logic programming , 1995, CACM.

[2]  Luc De Raedt,et al.  Inductive Logic Programming: Theory and Methods , 1994, J. Log. Program..

[3]  Alan M. Frisch Generalization and Learnability: a Study of Constrained Atoms , 1992 .

[4]  G. Ryle,et al.  The concept of mind. , 2004, The International journal of psycho-analysis.

[5]  Stephen Muggleton,et al.  A Learnability Model for Universal Representations and Its Application to Top-down Induction of Decision Trees , 1995, Machine Intelligence 15.

[6]  Saso Dzeroski,et al.  PAC-learnability of determinate logic programs , 1992, COLT '92.

[7]  Stephen Muggleton,et al.  Learning from Positive Data , 1996, Inductive Logic Programming Workshop.

[8]  J Black,et al.  Drugs from Emasculated Hormones: The Principle of Syntopic Antagonism , 1989, Science.

[9]  Richard A. Lewis,et al.  Drug design by machine learning: the use of inductive logic programming to model the structure-activity relationships of trimethoprim analogues binding to dihydrofolate reductase. , 1992, Proceedings of the National Academy of Sciences of the United States of America.

[10]  R. King,et al.  Prediction of rodent carcinogenicity bioassays from molecular structure using inductive logic programming. , 1996, Environmental health perspectives.

[11]  Stephen Muggleton,et al.  Bayesian inductive logic programming , 1994, COLT '94.

[12]  S. Muggleton,et al.  Protein secondary structure prediction using logic-based machine learning. , 1992, Protein engineering.

[13]  Ashwin Srinivasan,et al.  Theories for Mutagenicity: A Study in First-Order and Feature-Based Induction , 1996, Artif. Intell..

[14]  Herbert A. Simon,et al.  Applications of machine learning and rule induction , 1995, CACM.

[15]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[16]  M J Sternberg,et al.  Application of machine learning to structural molecular biology. , 1994, Philosophical transactions of the Royal Society of London. Series B, Biological sciences.

[17]  Patrick R. J. van der Laag Subsumption and Re nement in Model Inference , 1998 .

[18]  Christopher John Hogger,et al.  Essentials of logic programming , 1990 .

[19]  Ivan Bratko,et al.  Applications of inductive logic programming , 1995, SGAR.

[20]  ProgramsWilliam W. CohenAT Learnability of Restricted Logic Programs , 1993 .