SIGMa: simple greedy matching for aligning large knowledge bases

The Internet has enabled the creation of a growing number of large-scale knowledge bases in a variety of domains containing complementary information. Tools for automatically aligning these knowledge bases would make it possible to unify many sources of structured knowledge and answer complex queries. However, the efficient alignment of large-scale knowledge bases still poses a considerable challenge. Here, we present Simple Greedy Matching (SiGMa), a simple algorithm for aligning knowledge bases with millions of entities and facts. SiGMa is an iterative propagation algorithm that leverages both the structural information from the relationship graph and flexible similarity measures between entity properties in a greedy local search, which makes it scalable. Despite its greedy nature, our experiments indicate that SiGMa can efficiently match some of the world's largest knowledge bases with high accuracy. We provide additional experiments on benchmark datasets which demonstrate that SiGMa can outperform state-of-the-art approaches both in accuracy and efficiency.

[1]  Ronald Rousseau,et al.  Similarity measures in scientometric research: The Jaccard index versus Salton's cosine formula , 1989, Inf. Process. Manag..

[2]  James C. French,et al.  Applications of approximate word matching in information retrieval , 1997, CIKM '97.

[3]  Christiane Fellbaum,et al.  Book Reviews: WordNet: An Electronic Lexical Database , 1999, CL.

[4]  Pradeep Ravikumar,et al.  A Comparison of String Distance Metrics for Name-Matching Tasks , 2003, IIWeb.

[5]  Yannis Kalfoglou,et al.  Ontology mapping: the state of the art , 2003, The Knowledge Engineering Review.

[6]  Hermann Ney,et al.  A Systematic Comparison of Various Statistical Alignment Models , 2003, CL.

[7]  Steffen Staab,et al.  QOM - Quick Ontology Mapping , 2004, GI Jahrestagung.

[8]  Yuzhong Qu,et al.  GMO: A Graph Matching for Ontologies , 2005, Integrating Ontologies.

[9]  Ben Taskar,et al.  A Discriminative Matching Approach to Word Alignment , 2005, HLT.

[10]  Stefanos D. Kollias,et al.  A String Metric for Ontology Alignment , 2005, SEMWEB.

[11]  Hyoil Han,et al.  A survey on ontology mapping , 2006, SGMD.

[12]  Christian Bessiere,et al.  Constraint Propagation , 2006, Handbook of Constraint Programming.

[13]  Ben Taskar,et al.  Word Alignment via Quadratic Assignment , 2006, NAACL.

[14]  Gerhard Weikum,et al.  WWW 2007 / Track: Semantic Web Session: Ontologies ABSTRACT YAGO: A Core of Semantic Knowledge , 2022 .

[15]  Lise Getoor,et al.  Collective entity resolution in relational data , 2007, TKDD.

[16]  Ming Mao,et al.  Ontology Mapping: An Information Retrieval and Interactive Activation Network Based Approach , 2007, ISWC/ASWC.

[17]  Silvana Castano,et al.  On the Ontology Instance Matching Problem , 2008, 2008 19th International Workshop on Database and Expert Systems Applications.

[18]  Jérôme Euzenat,et al.  Ten Challenges for Ontology Matching , 2008, OTM Conferences.

[19]  Tim Berners-Lee,et al.  Linked data on the web (LDOW2008) , 2008, WWW.

[20]  Panos M. Pardalos,et al.  Quadratic Assignment Problem , 1997, Encyclopedia of Optimization.

[21]  Yi Li,et al.  RiMOM: A Dynamic Multistrategy Ontology Alignment Framework , 2009, IEEE Transactions on Knowledge and Data Engineering.

[22]  Christopher Ré,et al.  Large-Scale Deduplication with Constraints Using Dedupalog , 2009, 2009 IEEE 25th International Conference on Data Engineering.

[23]  Mathieu d'Aquin,et al.  Large scale integration of senses for the semantic web , 2009, WWW '09.

[24]  Elaine Shi,et al.  Link prediction by de-anonymization: How We Won the Kaggle Social Network Challenge , 2011, The 2011 International Joint Conference on Neural Networks.

[25]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[26]  Nilesh N. Dalvi,et al.  Large-Scale Collective Entity Matching , 2011, Proc. VLDB Endow..

[27]  Yuzhong Qu,et al.  A self-training approach for resolving object coreference on the semantic web , 2011, WWW.

[28]  Gerhard Weikum,et al.  LINDA: distributed web-of-data-scale entity matching , 2012, CIKM.

[29]  Peter Christen,et al.  A Survey of Indexing Techniques for Scalable Record Linkage and Deduplication , 2012, IEEE Transactions on Knowledge and Data Engineering.

[30]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.