论文信息 - Similarity Flooding: A Versatile Graph Matching Algorithm (Extended Technical Report)

Similarity Flooding: A Versatile Graph Matching Algorithm (Extended Technical Report)

Matching elements of two data schemas or two data instances plays a key role in data warehousing, e-business, or even biochemical applications. In this paper we present a matching algorithm based on a fixpoint computation that is usable across different scenarios. The algorithm takes two graphs (schemas, catalogs, or other data structures) as input, and produces as output a mapping between corresponding nodes of the graphs. Depending on the matching goal, a subset of the mapping is chosen using filters. After our algorithm runs, we expect a human to check and if necessary adjust the results. As a matter of fact, we evaluate the `accuracy' of the algorithm by counting the number of needed adjustments. We conducted a user study, in which our accuracy metric was used to estimate the labor savings that the users could obtain by utilizing our algorithm to obtain an initial matching. Finally, we discuss how our matching algorithm is deployed as one of several high-level operators in an implemented testbed for managing information models and mappings.

[1] Rajeev Motwani,et al. Randomized Algorithms , 1995, SIGA.

[2] William W. Cohen. Integration of heterogeneous databases without common domains using queries based on textual similarity , 1998, SIGMOD '98.

[3] Jennifer Widom,et al. Object exchange across heterogeneous information sources , 1995, Proceedings of the Eleventh International Conference on Data Engineering.

[4] Chris Clifton,et al. SEMINT: A tool for identifying attribute correspondences in heterogeneous databases using neural networks , 2000, Data Knowl. Eng..

[5] Laura M. Haas,et al. Schema Mapping as Query Discovery , 2000, VLDB.

[6] Sergey Brin,et al. The Anatomy of a Large-Scale Hypertextual Web Search Engine , 1998, Comput. Networks.

[7] Pedro M. Domingos,et al. Reconciling schemas of disparate data sources: a machine-learning approach , 2001, SIGMOD '01.

[8] Philip A. Bernstein,et al. A vision for management of complex models , 2000, SGMD.

[9] Erhard Rahm,et al. On Matching Schemas Automatically , 2001 .

[10] M. Kanehisa. Post-Genome Informatics , 2000 .

[11] Erhard Rahm,et al. Generic Schema Matching with Cupid , 2001, VLDB.

[12] Robert W. Irving,et al. The Stable marriage problem - structure and algorithms , 1989, Foundations of computing series.

[13] Hector Garcia-Molina,et al. Meaningful change detection in structured data , 1997, SIGMOD '97.

[14] Dan Brickley,et al. Resource Description Framework (RDF) Model and Syntax Specification , 2002 .