LDL-Mine: Integrating Data Mining with Intelligent Query Answering

Current applications of data mining techniques highlight the need for flexible knowledge discovery systems, capable of supporting the user in specifying and refining mining objectives, combining multiple strategies, and defining the quality of the extracted knowledge. A key issue is the definition of Knowledge Discovery Support Environment, i.e., a query system capable of obtaining, maintaining, representing and using high level knowledge in a unified framework. This comprises representation and manipulation of domain knowledge, extraction and manipulation of new knowledge and their combination. In such a context, in [2,3] we envisaged an integrated architecture of data mining, further developed and experimented in [5] and resulting in the LDL−Mine environment. The basic philosophy of the environment is to integrate both inductive and deductive capabilities in a unified framework. A LDL−Mine program is composed of three main parts: source knowledge, modeled by facts; background knowledge, modeled by deductive clauses; and induced knowledge, modeled by inductive clauses. Inductive clauses provide a suitable interface to data mining algorithms: they define predicates that represent mining patterns, but can be used as deductive predicates and facts. This allows to amalgamate induction and deduction, and to model both interactive and iterative features of a data mining process. Figure 1 shows the main features of the system. LDL−Mine is built on top of the LDL++ system [8]. Indeed, the system exploits most of the functionalities of the LDL++ system, such as Application programming interface, deductive engine, and access to external databases. In addition, LDL−Mine implements an inductive engine that allows, by means of inductive clauses, interaction between mining algorithms and deductive components. In its current stage, the inductive engine implements three main data mining schemes, namely association rules mining [1], Bayesian Classification [7], and (both supervised and unsupervised) discretization of continuous attributes [6]. Each scheme corresponds to a specific inductive clause. In the following we show by examples how the notion of inductive clause is formalized within the LDL−Mine system, and some specific inductive clauses currently implemented.