An Experiment in Naive Bayesian Record Linkage

Sharing data can represent a risk of disclosing sensitive information about the individuals which the data sets concern. Computationally complex techniques can be used by a socalled ‘data intruder’ to link such data and discover information about targeted individuals. Heuristic approaches to limiting this risk are aimed towards the more casual intruder. A knowledgeable intruder, armed with data mining tools, can uncover sensitive information from ostensibly safe data sets. This paper considers a method for assessing the risk of disclosure by a relatively knowledgeable intruder, whilst avoiding the computational problems associated with exact probability calculations.