论文信息 - Information Brokering Across Heterogeneous Digital Data

Information Brokering Across Heterogeneous Digital Data

ion Level Incompatibility In this case, heterogeneities arise due to differing levels of abstraction at which the entities may be represented (Section 1.4). These heterogeneities can be resolved by mapping the entities to concepts at appropriate levels of abstraction in the domain ontology. In case there do not exist appropriate concepts, one may have to construct c-context from concepts in the domain ontology. In case the concepts are from different ontologies, one may have to define terminological relationships across them. Examples of terminological relationships are hyponyms/hypernyms in the case of generalization/specialization and holonyms/meronyms in the case of aggregations. Schematic Discrepancies In this case, heterogeneities arise when data in one database corresponds to metadata in another (Section 1.5). One form of this heterogeneity is the attribute entity conflict. It can be resolved by mapping corresponding entities and attributes to appropriate c-contexts. For other forms of this heterogeneity, it is necessary to have a mechanism to specify correspondences between data in one database and metadata in another, and is beyond the scope of this book. 108 INFORMATION BROKERING We have discussed above how schematic details and heterogeneities can be abstracted out by using c-contexts associated with mapping expressions and transformer functions. The c-contexts so constructed are also used to capture the information content. From the perspective of information brokering, they may also be viewed as an intermediate language in which information content of the underlying databases is represented. The two perspectives based on which the c-contexts may be constructed are as follows. Bottom-Up Perspective In this case, the focus is on abstracting out the representational and schematic details. Thus, c-contexts are used as views on objects in the underlying databases, and the set of instances exported to the information broker on the GII obeys the view constraints. This is the perspective primarily followed in (Kashyap and Sheth, 1996). Top-Down Perspective In this case, the focus is on modeling and specifying information in an application or domain specific manner. Thus, it is assumed that there exist underlying objects in the databases for concepts in the ontologies. Mappings are then appropriately combined to determine the object instances in the underlying databases that satisfy the constraints specified in the c-contexts. This perspective is taken in (Mena et al., 1996b). A similar perspective has been taken in (Borgida and Brachman, 1993) for populating description logic (DL) expressions. 2.2 C-CONTEXTS: A PARTIAL REPRESENTATION Several efforts attempt to represent the similarity between two objects in databases. In (Larson et al., 1989), a fixed set of descriptors define essential characteristics of attributes, and are used to generate mappings between them. We have discussed in (Kashyap and Sheth, 1996), how the descriptors do not guarantee semantic similarity. Thus, any representation of c-context which can be described by a fixed set of descriptors is not appropriate. In our approach, the descriptors (or meta-attributes) are chosen dynamically to model characteristics of the application domain. It is not possible a priori to determine all possible meta-attributes that would completely characterize the semantics of an application domain. This leads to a partial representation of c-contexts. We represent a c-context as a collection of contextual coordinates (meta-attributes) as follows: Context = <(C1, Expr1) (C2, Expr2) ... (Ck, Exprk) > where Ci, 1 ≤ i ≤ k, is a contextual coordinate denoting an aspect of a c-context Ci may model some characteristic of the subject domain and may be obtained from a domain specific ontology (discussed later in this section) Ci may model an implicit assumption in the design of a database. Capturing Information Content in Structured Data 109 Now, we explain the meaning of the symbols Ci and Expri by using examples and by enumerating the corresponding DL expressions. When using DL expressions, it is possible to define primitive classes and in addition, specify classes using intensional descriptions phrased in terms of necessary and sufficient properties that must be satisfied by their instances. The intensional descriptions may be used to express collection of constraints that make up a c-context. Using the terminology of DL systems, each term may be modeled as either a concept or a role. Also, each Ci roughly corresponds to a role, and each Expri roughly corresponds to fillers for that role. Expri might be a term, c-context, or a term associated with a c-context. Heuristics for modeling terms as contextual coordinates or their values are discussed later in this section. The DL expressions corresponding to c-contexts are summarized in Appendix 5.A. We use the following example and terminology to explain how c-contexts capture information in the databases using terms from a domain ontology. Consider the following database objects: EMPLOYEE(SS#, Name, Salary Type, Dept, Affiliation) PUBLICATION(ld, Title, Journal) POSITION(ld, Title, Dept, Type) HAS-PUBLICATION(SS#, Id) HOLDS-POSITION(SS#, Id) Let us now illustrate with examples how information content in these database objects can be captured with the help of terms organized as c-contexts in a domain specific ontology. Some relevant terminology is as follows. ■ term(O) and term(A) are terms corresponding to the database object O and attribute A at the intensional level. We assume the existence of transformer functions between the domains of the terms (also referred to as the extension of the term) in the ontology, and the domains of the appropriate object or attribute in the database. ■ instance(V) is the instance corresponding to the data value V in the database. The data value might be a key or an object identifier. This might be implemented using a transformer function between the domains of the term to which the instance belongs in the ontology, and the domain of the appropriate object or attribute in the database. ■ Ext(Term) denotes the set of instances corresponding to the term in the ontology. The predicate term should have one more argument identifying the ontology which is being used, as a database might contain information in more than one information domain. However, we can assume without loss of generality that one ontology is being used to capture the information in this database. 110 INFORMATION BROKERING ■ Cdef(O) is the definition context of a database object O and is typically used to specify assumptions in the design of the object. It may also be used to share a pre-determined extension of the object with the GII (denoted as OG) . ■ O1 o Cass (O1, O2) denotes the association of an object O1 with an association context. This may be used to represent relationships between the objects O1 and O2 with reference to an aspect of the application domain. ■ Cq denotes the context associated with a query Q posed to an information broker on the GII. The context makes explicit (partially) the semantics of the query. A user can consult concepts in ontologies and objects in a database to construct the query context. We can identify the following associations: term(EMPLOYEE) = EmplConcept, term(EMPLOYEE.SS#) = EmplConcept.self, term(EMPLOYEE.Name) = name, term(EMPLOYEE.Dept) = hasEmployer, term(EMPLOYEE.Affiliation) = hasAffiliation, term(PUBLICATI0N) = PublConcept, term(PUBLICATION.ld) = { hasArticle, PublConcept.self } term(PUBLICATION.Title) = hasTitle, term(POSITION) = PostConcept, term(POSITION.ld) = { hasPosition, PostConcept.self } term(HAS-PUBLICATION) = HasPublConcept, term(HAS-PUBLICATION.Id) = { hasArticle, isAuthorOf } term(HAS-PUBLICATION.SS#) = hasAuthor, term(HOLDS-POSITION) = HoldsPostConcept, term(HOLDS-POSITION.SS#) = hasDesignee, term(HOLDS-POSITION.Id) = { hasposition, isDesigneeOf } The value Expri of a contextual coordinate Ci can be represented in the following manner. ■ Expri can be a variable. It is used as a place holder to elicit answers from the databases and impose constraints on them. Example: Suppose, we are interested in people who are authors and who hold a position (designee). We can represent the query context Cq as follows: Cq = <(isAuthorOf, X) (isDesigneeOf, Y)> For a detailed exposition about the various types of context see (Kashyap, 1997). Capturing Information Content in Structured Data 111 The same thing can be expressed in a DL as follows: Cq = (AND Anything (ATLEAST 1 isAuthorOf) (ATLEAST 1 isDesigneeOf)). The terms isAuthorOf and isDesigneeOf are obtained from a domain specific ontology. From a modeling perspective, the above query expresses the users’ interest in all employees that hold a position and have authored a published article. In this particular case, it can be seen intuitively that objects that are instances of EmplConcept are the right candidates. This can be expressed in the following manner. Cq = (AND EmplConcept (ATLEAST 1 isAuthorOf) (ATLEAST 1 isDesigneeOf)) It may be noted here that we use variables in a very restricted manner for the specific purpose of retrieving relevant properties of the selected objects. They are used only at the highest level of nesting though the c-contexts can have an arbitrary level of nesting (since each Expri can be a c-context or a term associated with a c-context), and hence we do not need to perform complex nested unifications. ■ Expri can be a set. – The set may be an enumeration of terms from a domain specific ontology. The set may be defined as the extension of an object or as elements from the domain of a type defined in the database. The set may be defined by posing constraints on pre-existing sets. –

Vipul Kashyap | Amit P. Sheth | A. Sheth | V. Kashyap