ARIA: Asymmetry Resistant Instance Alignment

We study the problem of instance alignment between knowledge bases (KBs). Existing approaches, exploiting the "symmetry" of structure and information across KBs, suffer in the presence of asymmetry, which is frequent as KBs are independently built. Specifically, we observe three types of asymmetries (in concepts, in features, and in structures). Our goal is to identify key techniques to reduce accuracy loss caused by each type of asymmetry, then design Asymmetry-Resistant Instance Alignment framework (ARIA). ARIA uses two-phased blocking methods considering concept and feature asymmetries, with a novel similarity measure overcoming structure asymmetry. Compared to a state-of-the-art method, ARIA increased precision by 19% and recall by 2%, and decreased processing time by more than 80% in matching large-scale real-life KBs.

[1]  Gjergji Kasneci,et al.  SIGMa: simple greedy matching for aligning large knowledge bases , 2012, KDD.

[2]  Jens Lehmann,et al.  DBpedia - A large-scale, multilingual knowledge base extracted from Wikipedia , 2015, Semantic Web.

[3]  Fabian M. Suchanek,et al.  Inside YAGO2s: a transparent information extraction architecture , 2013, WWW '13 Companion.

[4]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[5]  Serge Abiteboul,et al.  PARIS: Probabilistic Alignment of Relations, Instances, and Schema , 2011, Proc. VLDB Endow..

[6]  Yuzhong Qu,et al.  A self-training approach for resolving object coreference on the semantic web , 2011, WWW.

[7]  Andrew Borthwick,et al.  Dynamic Record Blocking: Efficient Linking of Massive Databases in MapReduce , 2012 .

[8]  Benjamin I. P. Rubinstein,et al.  Improving Entity Resolution with Global Constraints , 2011, ArXiv.

[9]  Yuzhong Qu,et al.  Constructing virtual documents for ontology matching , 2006, WWW '06.

[10]  J. Euzenat,et al.  Ontology Matching , 2007, Springer Berlin Heidelberg.

[11]  Stefanos D. Kollias,et al.  A String Metric for Ontology Alignment , 2005, SEMWEB.

[12]  Pradeep Ravikumar,et al.  Adaptive Name Matching in Information Integration , 2003, IEEE Intell. Syst..

[13]  Erhard Rahm,et al.  Similarity flooding: a versatile graph matching algorithm and its application to schema matching , 2002, Proceedings 18th International Conference on Data Engineering.

[14]  Peter Christen,et al.  A Comparison of Fast Blocking Methods for Record Linkage , 2003, KDD 2003.