Learning Semantic Web Rules : State of the Art and Directions of Research

The acquisition of Semantic Web rules is very demanding and can be automated though partially by applying Machine Learning (ML) algorithms. In this paper we provide a state-of-the-art survey of ML research relevant to this issue. In particular, we take a critical look at three ML frameworks that extend the methodological apparatus of Inductive Logic Programming to hybrid Knowledge Representation systems combining Description Logics and Clausal Logics. From the comparison of the three we draw general conclusions that suggest directions of research on this topic. 1 Motivation and Scope Rules are currently in the focus within the Semantic Web architecture, and consequently interest and activity in this area has grown rapidly over recent years. They would allow the integration, transformation and derivation of data from numerous sources in a distributed, scalable, and transparent manner. The rules landscape features design aspects of rule markup; engineering of engines, translators, and other tools; standardization efforts, such as the recent Rules Interchange Format (RIF) activity at W3C; and applications. Rules complement and extend ontologies on the Semantic Web. They can be used in combination with ontologies, or as a means to specify ontologies. Rules are also frequently applied over ontologies, to draw inferences, express constraints, specify policies, react to events, discover new knowledge, transform data, etc. Rule markup languages enrich Web ontologies by supporting publishing rules on the Web, exchange rules between different systems and tools, share guidelines and policies, merge and maintain rulebases, and more. Yet, whereas the mark-up language OWL for Semantic Web ontologies is already undergoing the second round of the standardization process at W3C, the debate around a RIF is still ongoing. Because of the great variety in rule languages and rule engine technologies, this format will consist of a core language to be used along with a set of standard and non-standard extensions. These extensions need not all be combinable into a single unified language. As for the expressive power, two directions are followed: monotonic extensions towards full First Order Logic (FOL) and non-monotonic extensions based on the Logic Programming tradition, i.e. on Clausal Logics (CLs). Since the design of OWL has been based on Description Logics (DLs) [1] (more precisely on the SH family of very expressive DLs [13]), non-monotonic rule languages for the Semantic Web will most likely be inspired by old hybrid Knowledge Representation (KR) systems such as AL-log [6] and Carin [17] that integrate DLs and (fragments of) CLs. Such rule formalisms are of interest to us. Other uses of rules, e.g. in OWL 2, are beyond the scope of the paper. The acquisition of Semantic Web rules is very demanding and can be automated though partially by applying Machine Learning (ML) algorithms. The ML approach known under the name of Inductive Logic Programming (ILP) [28] seems particularly promising for the following reasons. ILP is a form of Concept Learning rooted into Logic Programming. Thus it has been historically concerned with rule induction from examples and background knowledge within the KR framework of Horn Clausal Logic (HCL) and with the aim of prediction. The distinguishing feature of ILP, also with respect to other forms of Concept Learning, is the use of prior knowledge during the induction process. We claim that learning Semantic Web rules can be reformulated as learning rules by having ontologies as prior knowledge. This may motivate an interest of the Semantic Web community in ILP. In this paper we take a critical look at three ILP attempts at learning rules within hybrid DL-CL KR frameworks. From the comparative analysis of them we shall draw general conclusions that can be considered as guidelines for further ILP research of interest to the Semantic Web. The paper is organized as follows. Section 2 first provides essential information on the ILP methodological apparatus for non informed readers. Section 3 briefly describes three major forms of integration of DLs and CLs. Section 4 provides a state-of-the-art survey of ILP proposals for the hybrid DL-CL formalisms considered in Section 3 and outlines directions of future work. Section 5 concludes the paper with final remarks. Appendixes A and B provide the basic notions of DLs and CLs, respectively. 2 Learning rules with ILP Inductive Logic Programming (ILP) was born at the intersection between Concept Learning and Logic Programming. From Concept Learning it has inherited the inferential mechanisms for induction, the most prominent of which is generalization. A distinguishing feature of ILP with respect to other forms of Concept Learning is the use of background knowledge (BK). From Logic Programming it has borrowed the KR framework, i.e. HCL. In Concept Learning, thus in ILP, generalization is traditionally viewed as search through a partially ordered space of inductive hypotheses [26]. According to this vision, an inductive hypothesis is a clausal theory and the induction of a single clause requires (i) structuring, (ii) searching and (iii) bounding the space of clauses [28]. First we focus on (i) by clarifying the notion of ordering for clauses. An ordering allows for determining which one, between two clauses, is more general than the other. Since partial orders are considered, uncomparable pairs of clauses are admitted. One such ordering is θ-subsumption [29]: Given two clauses C and D, we say that C θ-subsumes D if there exists a substitution θ, such that Cθ ⊆ D. Given the usefulness of BK, orders have been proposed that reckon with it. Among them, generalized subsumption [2] is of major interest to this paper: Given two definite clauses C and D standardized apart and a definite program K, we say that C K D iff there exists a ground substitution θ for C such that (i) head(C)θ = head(D)σ and (ii) K∪ body(D)σ |= body(C)θ where σ is a Skolem substitution for D with respect to {C}∪K. Generalized subsumption is also called semantic generality in contrast to θ-subsumption which is a purely syntactic generality. In the general case, generalized subsumption is undecidable and does not introduce a lattice on a set of clauses. Because of these problems, θ-subsumption is more frequently used in ILP systems. Yet for Datalog generalized subsumption is decidable and admits a least general generalization. Once structured, the space of hypotheses can be searched (ii) by means of refinement operators. A refinement operator is a function which computes a set of specializations or generalizations of a clause according to whether a top-down or a bottom-up search is performed. The two kinds of refinement operator have been therefore called downward and upward, respectively. The definition of refinement operators presupposes the investigation of the properties of the various orderings and is usually coupled with the specification of a declarative bias for bounding the space of clauses (iii). Bias concerns anything which constrains the search for theories, e.g. a language bias specifies syntactic constraints on the clauses in the search space. Induction with ILP generalizes from individual instances/observations in the presence of BK, finding valid hypotheses. Validity depends on the underlying setting. At present, there exist several formalizations of induction in ILP that can be classified according to the following two orthogonal dimensions: the scope of induction (discrimination vs characterization) and the representation of observations (ground definite clauses vs ground unit clauses) [5]. Discriminant induction aims at inducing hypotheses with discriminant power as required in tasks such as classification. In classification, observations encompass both positive and negative examples. Characteristic induction is more suitable for finding regularities in a data set. This corresponds to learning from positive examples only. The second dimension affects the notion of coverage, i.e. the condition under which a hypothesis explains an observation. In learning from entailment, hypotheses are clausal theories, observations are ground definite clauses, and a hypothesis covers an observation if the hypothesis logically entails the observation. In learning from interpretations, hypotheses are clausal theories, observations are Herbrand interpretations (ground unit clauses) and a hypothesis covers an observation if the observation is a model for the hypothesis. 3 KR behind Semantic Web Rules The definition of a rule language for the Semantic Web follows the tradition of KR research on hybrid systems, i.e. those systems which are constituted by 1 See Appendix B for details of set notation for clauses. two or more subsystems dealing with distinct portions of a single KB by performing specific reasoning procedures [10]. The motivation for investigating and developing such systems is to improve on two basic features of KR formalisms, namely representational adequacy and deductive power, by preserving the other crucial feature, i.e. decidability. Indeed DLs and CLs are FOL fragments incomparable as for the expressiveness and the semantics but combinable under certain conditions [30]. In particular, combining DLs with HCL can easily yield to undecidability if the interaction scheme between the DL and the CL part of an hybrid KB does not fulfill the condition of safeness, i.e. does not solve the semantic mismatch between DLs and CLs [27,31]. A comprehensive study of the effects of combining DLs and CLs (more precisely, Horn rules) can be found in [17]. Special attention is devoted to the DL ALCNR. The results of the study can be summarized as follows: (i) answering conjunctive queries over ALCNR TBoxes is decidable, (ii) query answering in ALCNR extended with non-recursive Datalog rules, where both concepts and roles can occur in rule bodies, is also decidable, as it can be reduced to answering a unio

[1]  Francesco M. Donini,et al.  AL-log: Integrating Datalog and Description Logics , 1998, Journal of Intelligent Information Systems.

[2]  Riccardo Rosati,et al.  Semantic and Computational Advantages of the Safe Integration of Ontologies and Rules , 2005, PPSWR.

[3]  Ian Horrocks,et al.  Conjunctive Query Answering for the Description Logic SHIQ , 2007, IJCAI.

[4]  Wray L. Buntine Generalized Subsumption and Its Applications to Induction and Redundancy , 1986, Artif. Intell..

[5]  Boris Motik,et al.  Query Answering for OWL-DL with Rules , 2004, SEMWEB.

[6]  Georg Gottlob,et al.  Disjunctive datalog , 1997, TODS.

[7]  Agnieszka Lawrynowicz,et al.  Towards Discovery of Frequent Patterns in Description Logics with Rules , 2005, RuleML.

[8]  Ian Horrocks,et al.  From SHIQ and RDF to OWL: the making of a Web Ontology Language , 2003, J. Web Semant..

[9]  Luc De Raedt,et al.  Clausal Discovery , 1997, Machine Learning.

[10]  Donato Malerba,et al.  Bridging the Gap between Horn Clausal Logic and Description Logics in Inductive Learning , 2003, AI*IA.

[11]  Jörg-Uwe Kietz,et al.  Learnability of Description Logic Programs , 2002, ILP.

[12]  Ian Horrocks,et al.  Practical Reasoning for Very Expressive Description Logics , 2000, Log. J. IGPL.

[13]  Alan M. Frisch The Substitutional Framework for Sorted Deduction: Fundamental Results on Hybrid Reasoning , 1991, Artif. Intell..

[14]  C. Eline Rouveirol,et al.  Towards Learning in Carin-aln , 2000 .

[15]  Alon Y. Halevy,et al.  Combining Horn Rules and Description Logics in CARIN , 1998, Artif. Intell..

[16]  Gordon Plotkin,et al.  A Note on Inductive Generalization , 2008 .

[17]  Francesca A. Lisi,et al.  Learning SHIQ+log Rules for Ontology Evolution , 2008, SWAP.

[18]  Donato Malerba,et al.  Ideal Refinement of Descriptions in AL-Log , 2003, ILP.

[19]  Francesca A. Lisi,et al.  Under Consideration for Publication in Theory and Practice of Logic Programming Building Rules on Top of Ontologies for the Semantic Web with Inductive Logic Programming , 2007 .

[20]  Michael Gelfond,et al.  Classical negation in logic programs and disjunctive databases , 1991, New Generation Computing.

[21]  J. Lloyd Foundations of Logic Programming , 1984, Symbolic Computation.

[22]  Gert Smolka,et al.  Attributive Concept Descriptions with Complements , 1991, Artif. Intell..

[23]  Diego Calvanese,et al.  The Description Logic Handbook: Theory, Implementation, and Applications , 2003, Description Logic Handbook.

[24]  Editors , 2003 .

[25]  Donato Malerba,et al.  Inducing Multi-Level Association Rules from Multiple Relations , 2004, Machine Learning.

[26]  Francesca A. Lisi,et al.  Foundations of Onto-Relational Learning , 2008, ILP.

[27]  Alan M. Frisch Sorted Downward Refinement: Building Background Knowledge into a Refinement Operator for Inductive Programming , 1999, ILP.

[28]  Diego Calvanese,et al.  DL-Lite: Practical Reasoning for Rich Dls , 2004, Description Logics.

[29]  Riccardo Rosati,et al.  On the decidability and complexity of integrating ontologies and rules , 2005, J. Web Semant..

[30]  Francesca A. Lisi,et al.  Efficient Evaluation of Candidate Hypotheses in AL-log , 2004, ILP.

[31]  Riccardo Rosati,et al.  DL+log: Tight Integration of Description Logics and Disjunctive Datalog , 2006, KR.

[32]  Matthias Jarke,et al.  Logic Programming and Databases , 1984, Expert Database Workshop.

[33]  Tom M. Mitchell,et al.  Generalization as Search , 2002 .

[34]  Anthony G. Cohn,et al.  Thoughts and Afterthoughts on the 1988 Workshop on Principles of Hybrid Reasoning , 1991, AI Mag..