Vector Representation for Sub-Graph Encoding to Resolve Entities

Abstract Entity Resolution, i.e., determining whether two mentions refer to the same entity, is a crucial step in combining evidence from multiple sources, and is a problem encountered in a wide-range of areas, from modeling causes of cancer to identifying terrorist networks. Entity mentions are represented by attributes and relations to other entities. However, entity attributes and relations from different sources often use different names and specify relationships differently, which leads to low entity resolution precision and recall. Our contribution is based on our observation that relationships are more reliable than attributes when comparison is based on relational similarity, not exact matches. Traditional graph comparison techniques rely on finding precise matches of a significant part of the graph structure, and require custom comparison functions for every type of attribute and every type of relation. This leads to a system that is difficult to maintain and enhance. We encode entity nodes and their graph neighborhoods in semantic vectors, efficiently indexing the vectors, and calculating vector similarity. Our approach is insensitive to small variations in relational graph representation. Our approach uses simple vector addition, permutation, and difference only, leading to reduced computational complexity. Our preliminary experiment shows 83.05% accuracy.