Validating Distance-Based Record Linkage with Probabilistic Record Linkage

This work compares two alternative methods for record linkage: distance based and probabilistic record linkage. It compares the performance of both approaches when data is categorical. To that end, a distance over ordinal and nominal scales is defined. The paper shows that, for categorical data, distance-based and probabilistic-based record linkage lead to similar results in relation to the number of re-identified records. As a consequence, the distance proposed for ordinal and nominal scales is implicitly validated.