Efficient Entity Resolution Based on Sequence Rules

Entity resolution (ER) is to find the data objects referring to the same real-world entity. When ER is performed on relations, the crucial operator is record matching, which is to judge whether two tuples referring to the same real-world entity. Record matching is a longstanding issue. However, with massive and complex data in applications, current methods cannot satisfy the requirements. A Sequence-rule-based record matching (SeReMatching) is presented with the consideration of both the values of the attributes and their importance in record matching. And with the help of the Bloom Filter we changed, the algorithm greatly increases the checking speed and makes the complexity of entity resolution almost O(n). And extensive experiments are performed to evaluate our methods.