Record Linkage I : Evaluation of Commercially Available Record Linkage Software for Use in NASS
暂无分享,去创建一个
Record linkage is an important technique in NASS for minimizing the presence of duplicate names on its list sampling frame of farm operators and agribusinesses. In the late 1970' s, NASS developed an automated record linkage system which runs on an IBM mainframe for this purpose. With changes in technology, the need has arisen for portability between platforms, integration with client/server technology, and interactive operation. Also, NASS desires to reduce resource expenditures on record linkage while maintaining the quality of the process. The growing availability of commercial record linkage solutions has made unnecessary the development of a new record linkage system or an expensive and difficult rewrite of the old system. This report evaluates six commercially available record linkage software packages for their suitability for NASS's purposes. The report starts with a brief discussion of record linkage in NASS, then discusses the statistical theory behind the most popular probabilistic record linkage solution, that of Fellegi and Sunter. Next, the report discusses the requirements for a NASS record linkage system. Detailed reviews of the six software packages follow. Except for the review of AUTOMA TCH, which NASS has tested extensively, these reviews are based on information provided by the software manufacturers. The report concludes that, for NASS's purposes, AUTOMA TCH is the best choice. The report ends with a glossary of record linkage terminology and a checklist for the evaluation of record linkage software packages.
[1] B. J. Tepping. A Model for Optimum Linkage of Records , 1968 .
[2] William S. Cooper,et al. Foundations of Probabilistic and Utility-Theoretic Indexing , 1978, JACM.
[3] Clement T. Yu,et al. Term Weighting in Information Retrieval Using the Term Precision Model , 1982, JACM.
[4] C. J. van Rijsbergen,et al. The selection of good search terms , 1981, Inf. Process. Manag..