An inductive database and query language in the relational model

In the demonstration, we will present the concepts and an implementation of an inductive database -- as proposed by Imielinski and Mannila -- in the relational model. The goal is to support all steps of the knowledge discovery process, from pre-processing via data mining to post-processing, on the basis of queries to a database system. The query language SIQL (structured inductive query language), an SQL extension, offers query primitives for feature selection, discretization, pattern mining, clustering, instance-based learning and rule induction. A prototype system processing such queries was implemented as part of the SINDBAD (structured inductive database development) project. Key concepts of this system, among others, are the closure of operators and distances between objects. To support the analysis of multi-relational data, we incorporated multi-relational distance measures based on set distances and recursive descent. The inclusion of rule-based classification models made it necessary to extend the data model and the software architecture significantly. The prototype is applied to three different applications: gene expression analysis, gene regulation prediction and structure-activity relationships (SARs) of small molecules.

[1]  J. R. Quinlan Learning Logical Definitions from Relations , 1990 .

[2]  Giuseppe Psaila,et al.  An Extension to SQL for Mining Association Rules , 1998, Data Mining and Knowledge Discovery.

[3]  Hans-Peter Kriegel,et al.  Data Mining: The Next Generation , 2004 .

[4]  John J. Donovan,et al.  Systems programming , 1973, CSC '73.

[5]  Stefan Kramer,et al.  Inductive logic programming for gene regulation prediction , 2007, Machine Learning.

[6]  Maurice Bruynooghe,et al.  A polynomial time computable metric between point sets , 2001, Acta Informatica.

[7]  Jean-François Boulicaut,et al.  Data Mining Query Languages , 2005, Data Mining and Knowledge Discovery Handbook.

[8]  Stefan Kramer,et al.  Inductive Databases in the Relational Model: The Data as the Bridge , 2005, KDID.

[9]  M. Boyd,et al.  New soluble-formazan assay for HIV-1 cytopathic effects: application to high-flux screening of synthetic and natural products for AIDS-antiviral activity. , 1989, Journal of the National Cancer Institute.

[10]  Tomasz Imielinski,et al.  MSQL: A Query Language for Database Mining , 1999, Data Mining and Knowledge Discovery.

[11]  Zhaohui Tang,et al.  Data Mining with SQL Server 2005 , 2005 .

[12]  J. Mesirov,et al.  Molecular classification of cancer: class discovery and class prediction by gene expression monitoring. , 1999, Science.

[13]  Luc De Raedt,et al.  Molecular feature mining in HIV data , 2001, KDD '01.

[14]  Wei Wang,et al.  DMQL: A Data Mining Query Language for Relational Databases , 2007 .

[15]  Stéphane Bressan,et al.  Introduction to Database Systems , 2005 .