A Clustering-Based Approach for Large-Scale Ontology Matching

Schema and ontology matching have attracted a great deal of interest among researchers. Despite the advances achieved, the large matching problem still presents a real challenge, such as it is a time-consuming and memory-intensive process. We therefore propose a scalable, clustering-based matching approach that breaks up the large matching problem into smaller matching problems. In particular, we first introduce a structure-based clustering approach to partition each schema graph into a set of disjoint subgraphs (clusters). Then, we propose a new measure that efficiently determines similar clusters between every two sets of clusters to obtain a set of small matching tasks. Finally, we adopt the matching prototype COMA++ to solve individual matching tasks and combine their results. The experimental analysis reveals that the proposed method permits encouraging and significant improvements.

[1]  Erhard Rahm,et al.  A survey of approaches to automatic schema matching , 2001, The VLDB Journal.

[2]  Nicolás Marín,et al.  Review of Data on the Web: from relational to semistructured data and XML by Serge Abiteboul, Peter Buneman, and Dan Suciu. Morgan Kaufmann 1999. , 2003, SGMD.

[3]  Shensheng Zhang,et al.  Matching Large Scale Ontology Effectively , 2006, ASWC.

[4]  Yuzhong Qu,et al.  Matching large ontologies: A divide-and-conquer approach , 2008, Data Knowl. Eng..

[5]  Xiaowei Xu,et al.  AHSCAN: Agglomerative Hierarchical Structural Clustering Algorithm for Networks , 2009, 2009 International Conference on Advances in Social Network Analysis and Mining.

[6]  Phokion G. Kolaitis,et al.  Semi-Automatic Schema Integration in Clio , 2007, VLDB.

[7]  Richi Nayak,et al.  Element similarity measures in XML schema matching , 2010, Inf. Sci..

[8]  Erhard Rahm,et al.  Evaluating Instance-based Matching of Web Directories , 2008, WebDB.

[9]  Eric Peukert,et al.  Rewrite techniques for performance optimization of schema matching processes , 2010, EDBT '10.

[10]  Erhard Rahm,et al.  Schema Matching and Mapping , 2013, Schema Matching and Mapping.

[11]  Giovanna Guerrini,et al.  An Overviewof Similarity Measures for Clustering XML Documents , 2007 .

[12]  Dan Suciu,et al.  Data on the Web: From Relations to Semistructured Data and XML , 1999 .

[13]  Masaki Aono,et al.  An efficient and scalable algorithm for segmented alignment of ontologies of arbitrary size , 2009, J. Web Semant..

[14]  Hyoil Han,et al.  A survey on ontology mapping , 2006, SGMD.

[15]  Chantal Reynaud,et al.  Alignment-Based Partitioning of Large-Scale Ontologies , 2009, EGC.

[16]  Fausto Giunchiglia,et al.  The Semantic Web - ASWC 2006, First Asian Semantic Web Conference, Beijing, China, September 3-7, 2006, Proceedings , 2006, ASWC.

[17]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[18]  Steffen Staab,et al.  QOM - Quick Ontology Mapping , 2004, GI Jahrestagung.

[19]  Eric Peukert,et al.  Comparing Similarity Combination Methods for Schema Matching , 2010, GI Jahrestagung.

[20]  Avigdor Gal,et al.  Managing Uncertainty in Schema Matching with Top-K Schema Mappings , 2006, J. Data Semant..

[21]  Erhard Rahm,et al.  Matching large XML schemas , 2004, SGMD.

[22]  Erhard Rahm,et al.  Matching large schemas: Approaches and evaluation , 2007, Inf. Syst..

[23]  Elizabeth Schroeder,et al.  Emerging techniques and technologies , 2009 .