This paper presents a system for database linkage based on the probabilistic record linkage technique, developed in the C++ language with the Borland C++ Builder version 3.0 programming environment. The system was tested in the linkage of data sources of different sizes, evaluated both in terms of processing time and sensitivity for identifying true record pairs. Significantly less time was spent in record processing when the program was used, as compared to manual processing, especially in situations where larger databases were used. Manual and automatic processes had equivalent sensitivities in situations where we used databases with fewer records. However, as the number of records grew we noticed a clear reduction in the sensitivity of the manual process, but not in the automatic one. Although in its initial stage of development, the system performed well in terms of both processing speed and sensitivity. Although overall performance of algorithms was satisfactory, we intend to evaluate other routines in the attempt to improve the system's performance.
[1]
H B NEWCOMBE,et al.
Automatic linkage of vital records.
,
1959,
Science.
[2]
Matthew A. Jaro,et al.
Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida
,
1989
.
[3]
K. White,et al.
Clinical epidemiology.
,
1983,
International journal of epidemiology.
[4]
P A Van den Brandt,et al.
Development of a record linkage protocol for use in the Dutch Cancer Registry for Epidemiological Research.
,
1990,
International journal of epidemiology.
[5]
P. Sorlie,et al.
Probabilistic methods in matching census samples to the National Death Index.
,
1986,
Journal of chronic diseases.