Methods and Techniques for Ontology Matching and Evolution in Open Distributed Systems

In open distributed systems many different nodes, possibly spanned across multiple organizations, need to share resources (e.g., data, documents, services) provided by other nodes. In order to address this requirement, ontologies are exploited for describing data and resources in a way that is understandable and usable by the target users community. In order to exploit ontologies for knowledge sharing and resource discovery purposes, ontology matching and evolution capabilities are required. In this thesis abstract, these requirements are addressed by proposing methods and techniques for ontology matching and evolution conceived for coping with the specific requirements of open distributed systems. 1 The research question of the thesis Ontologies are generally recognized as an essential tool for allowing communication and knowledge sharing among distributed users and applications, by providing a common understanding of a domain of interest. Due to the vision of the Semantic Web, a large body of research is being moving around ontologies, and contributions have been produced regarding methods and tools for covering the entire ontology life cycle, from design to deployment and reuse. As a matter of fact, when considering distributed contexts, the knowledge of interest is generally provided by many different ontologies, which specify formal semantics of data for different intelligent services in order to support information sharing, search, retrieval, and transformation [1, 10]. With respect to this scenario, the thesis addresses three main general requirements: (i) in a multiontology context flexible matching techniques are required in order to provide semantic mappings among ontologies for discovery and sharing of knowledge and data; (ii) evolution techniques are required for supporting the acquisition of new knowledge from other ontologies, by contemporary preserving ontology consistency; (iii) matching and evolution techniques have to be conceived for coping with the specific requirements of open distributed systems. 1.1 Emerging requirements in open distributed systems In open distributed systems like Peer-to-Peer networks and Grids, many different nodes, possibly spanned across multiple organizations, need to share resources (e.g., data, documents, services) provided by other nodes. In particular, the following features affect knowledge sharing and evolution in this context, and need to be addressed by appropriate methods and techniques: (i) dynamism of the system, regards the fact that nodes are allowed to join and leave the network at any moment; (ii) autonomy of nodes, in that each node is responsible for its own knowledge management and representation; (iii) absence of a-priori agreement, about ontology vocabulary and language to be used for knowledge specification; (iv) equality of node responsibilities, no centralized nodes with coordinating tasks are recognized and each node enforces interaction facilities with other nodes for knowledge sharing and evolution. A general problem in such a context is dynamic knowledge discovery. For dynamic knowledge discovery we mean the capability of finding knowledge in the network about existing resources that best match the requirements of a given request for a target resource(s). For example, in Peer-to-Peer networks the problem of sharing and disseminating contents is crucial, and recent research is devoted to provide techniques for evolving from basic Peer-to-Peer networks supporting only file exchanges using simple filenames as metadata, to more complex systems like schema-based Peer-to-Peer networks, capable of supporting the exchange of complex contents (e.g., documents, relational data) by exploiting explicit schemas to describe knowledge, usually using RDF and thematic ontologies as metadata [8, 9]. As another example, the resource discovery problem in Grids involves the assignment of resources to tasks, given task requirements and resource policies [14]. A common issue both in Peer-to-Peer and Grid is related to the fact that data and resources need to be described in a way that is understandable and usable by the community that is target user, by means of ontologies. In order to exploit ontologies for knowledge sharing and resource discovery purposes, methods and techniques for ontology matching and evolution are required. The general goal of ontology matching techniques is to compare ontological descriptions for finding concepts that have a semantic affinity with a target concept. The goal of ontology evolution is to increase the knowledge of each node of an open distributed system by acquiring resource descriptions from the ontologies of the other nodes. 2 Relevant related work In this Section, we present a short discussion of some relevant work describing open distributed systems as well as some example of different approaches for ontology matching. 2.1 Open distributed systems In the following, we summarize some relevant examples of open distributed systems using metadata descriptions of their resources. Edutella. Edutella [9] is an open source project that creates an infrastructure for sharing metadata in RDF format. It applies the Peer-to-Peer model using the JXTA protocol. The network is segmented into thematic clusters. In each cluster, a mediator semantically integrates source metadata. Edutella is an example of hybrid Peer-to-Peer architecture, in that each source sends queries to the mediator of its own cluster, and the mediator returns a list of nodes eligible to offer semantically related information. The effective data access holds in direct network connections among peers. The mediator handles a request either directly or indirectly: directly, by answering queries using its own integrated schema; indirectly, by querying other cluster mediators. Data Mapping. In Data Mapping [11], an approach to determine and handle mappings among heterogeneous data sources in a Peer-to-Peer framework is described. It is an example of pure Peer-to-Peer architecture: network nodes are really equipotential for functionalities and capabilities and interact each others using a Gnutella-like protocol. Each node determines semantic mappings among instances of its entities, and takes care of mapping consistency interacting with domain experts. Relations are shared with other peers, that run a comparison and search algorithm to create new relations between received mappings and their own data schemas. Results will be distributed again to progressively increase the knowledge of each community member. Swap. The Swap [8] project aims at overcoming the lack of semantics in current Peer-to-Peer systems. To this purpose, an RDF(S) metadata model for encoding semantic information is introduced, allowing peers to handle heterogeneous and even contradictory views on the domain of interest. Each peer implements an ontology extraction method to extract from its different information sources an RDF(S) description (ontology) compatible with the SWAP metadata model. Such ontologies are used by the SeRQL Query Language to perform query processing: peers storing knowledge semantically related to a target concept are localized through SeRQL views defined on specific similarity measures. Views from external peers are integrated through an ontology merging method to extend the knowledge of the receiving peer according to a rating model. 2.2 Ontology matching In the following, we present examples of different approaches for ontology matching in open distributed systems. GLUE. Glue [7] exploits machine learning techniques to find semantic mappings between concepts stored in distinct and autonomous ontologies. Given two distinct ontologies, the mapping discovery process between their concepts is based on the measure of similarity which is defined through the joint probability distribution. Glue proposes a machine learning approach: the measure of similarity between two concepts is computed as the likelihood that an instance belongs to both the concepts. According to these probabilistic measurements, two base learning techniques are applied in order to build a similarity matrix expressing the prediction of semantic affinity between concepts. A relaxation labeling procedure is performed in order to improve the matching accuracy of the affinity predictions. Domain-independent and domain-dependent constraints are introduced to evaluate such kind of refinement process. Edamok. Edamok [2] is a research project focused on semantic interoperability issues in Peer-to-Peer systems. The project implements the KEx (Knowledge Exchange) Peer-to-Peer system which aims to realize knowledge sharing among peer communities of interest (called federations). The system is based on the concept of context of a peer, to represent the interests of the peer. In order to point out semantic mapping between concepts stored in distinct peers, the system uses the Ctx-Match algorithm. This algorithm compares the knowledge contained in different contexts looking for semantic mappings denoting peers interested in similar concepts. These mappings are stored in order to assist the query resolution components to direct queries to peers which store relevant information. The Ctx-Match is based on a semantic explication phase where concepts are associated with the correct meaning with respect to their context, and on a semantic comparison phase where concepts are translated in logical axioms and matched. The algorithm implements a description logic approach: mapping discovering is reduced to the problem of checking a set of logical relations. KAON. Kaon [12] is an open-source ontology management infrastructure tailored for business applications. A modeling language based on RDFS has been developed to provide an unified environment for ontology creation, evolution and reuse. Kaon is not specifically thought for a Peer-to-Peer environment. Anyway, the authors show how different nodes can interact for searching and reusing diffe

[1]  Stefan Decker,et al.  Ontology-Based Resource Matching in the Grid - The Grid Meets the Semantic Web , 2003, SEMWEB.

[2]  Silvana Castano,et al.  Ontologies and Matching Techniques for Peer-based Knowledge Sharing , 2003, CAiSE Short Paper Proceedings.

[3]  Boris Motik,et al.  An infrastructure for searching, reusing and evolving distributed ontologies , 2003, WWW '03.

[4]  Pedro M. Domingos,et al.  Learning to map between ontologies on the semantic web , 2002, WWW '02.

[5]  James A. Hendler,et al.  The Semantic Web" in Scientific American , 2001 .

[6]  George A. Miller,et al.  WordNet: A Lexical Database for English , 1995, HLT.

[7]  Luciano Serafini,et al.  A SAT-Based Algorithm for Context Matching , 2003, CONTEXT.

[8]  Steffen Staab,et al.  A Metadata Model for Semantics-Based Peer-to-Peer Systems , 2003 .

[9]  Renée J. Miller,et al.  Data mapping in peer-to-peer systems: Semantics and algorithmic issues , 2003, SIGMOD 2003.

[10]  Silvana Castano,et al.  HELIOS: a general framework for ontology-based knowledge sharing and evolution in P2P systems , 2003, 14th International Workshop on Database and Expert Systems Applications, 2003. Proceedings..

[11]  Silvana Castano,et al.  H-MATCH: an Algorithm for Dynamically Matching Ontologies in Peer-based Systems , 2003, SWDB.

[12]  Silvana Castano,et al.  Ontology-Addressable Contents in P2P Networks , 2003 .

[13]  Tore Risch,et al.  EDUTELLA: a P2P networking infrastructure based on RDF , 2002, WWW.