This paper presents a simple and adaptable matching method dealing with web directories, catalogs and OWL ontologies. By using a well-known Knowledge Discovery in Databases model, such as the association rule paradigm, this method has the originality to be both extensional and asymmetric. It works at the terminological level (by selecting concept-relevant terms contained in documents) and permits to discover equivalence and also subsumption relations holding between entities (concepts and properties). This method relies on the implication intensity measure, a probabilistic model of deviation from independence. Selection of significant rules between concepts (or properties) is lead by two criteria permitting to assess respectively the implication quality and the generativity of the rule. Finally, the proposed method is evaluated on two benchmarks. The first contains two conceptual hierarchies populated with textual documents and the second one is composed of OWL ontologies.
[1]
Tomasz Imielinski,et al.
Mining association rules between sets of items in large databases
,
1993,
SIGMOD Conference.
[2]
Erhard Rahm,et al.
A survey of approaches to automatic schema matching
,
2001,
The VLDB Journal.
[3]
Régis Gras,et al.
Implication Intensity: From the Basic Statistical Definition to the Entropic Version
,
2003
.
[4]
Gregory Piatetsky-Shapiro,et al.
Advances in Knowledge Discovery and Data Mining
,
2004,
Lecture Notes in Computer Science.
[5]
Jérôme Euzenat,et al.
A Survey of Schema-Based Matching Approaches
,
2005,
J. Data Semant..