Record Linkage in Healthcare: Applications, Opportunities, and Challenges for Public Health

Recent years have witnessed the development of new record linkage technologies that are increasingly being used for data integration in various application settings. The authors’ objective in this article is to provide a review of recent developments in medical record linkage and their implications in healthcare research and public health policies. In particular, the authors assess the key advantages and possible limitations of record linkage techniques and technologies in various health care scenarios where different pieces of patient records are collected and managed by different agencies. First, the authors provide a brief overview of deterministic, probabilistic, and unsupervised record linkage techniques and their advantages and limitations. Then, the authors describe current probablistic record linkage software and their functionalities, and present specific cases where probabilistic linkage has been successfully used to enhance decision-making in healthcare delivery as well as in healthcare-related public policy making. Finally, the authors outline some of the critical issues and challenges of integrating medical records across distributed databases, including technical considerations as well as concerns about patient privacy and confidentiality. databases tend to be fragmented and incomplete. Thus, the ability to compare and match data records from multiple sources in order to determine which sets of records belong to the same person, object, or event has become a critical task for many organizations. However. the possibility of extensive analysis using these databases relies on the ability to integrate heterogeneous databases across organizations and functional units. Such data integration requires the presence of an error-free unique identifier or key attribute common among the data sets beDOI: 10.4018/jhdri.2010070104 30 International Journal of Healthcare Delivery Reform Initiatives, 2(3), 29-47, July-September 2010 Copyright © 2010, IGI Global. Copying or distributing in print or electronic forms without written permission of IGI Global is prohibited. ing matched. Unfortunately, in most real-world situations, this common key attribute across data sets is rarely available. Consequently, instead of relying upon a deterministic approach using unique identifiers, past research studies have proposed probabilistic algorithms to achieve the goal of record matching across heterogeneous databases. Among these early studies, seminal work by Newcombe, Kennedy, Axford, and James (1959) and Fellegi and Sunter (1969) provide theoretical frameworks for computeraided record linkage operations. Other more recent scholarly studies on this topic include Dey, Sarkar, and De (1998); Bell and Sethi (2001); Dey, Sarkar, and De (2002); Verykios, Moustakides, and Elfeky (2002); Sarathy and Muralidhar (2006); and Jiang, Sarkar, De, and Dey (2007). Although the algorithmic procedures to match data records suggested in these studies may vary, they share a common objective of linking records that belong to the same entity while minimizing the likelihood of erroneous matching (i.e., ensuring sensitivity and specificity). Statistical theory used in record linkage was developed in the 1950s and was further refined in the 1970s and 1980s (Jaro, 1989; Newcombe et al., 1959). Until the early1980s, no commercial record linkage software was marketed, and those with a need for record linkage had to develop their own software (e.g., the Generalized Record Linkage System (GRLS) developed at Statistics Canada). They often faced the choice of using less accurate methods or expending a considerable amount of resources to create proprietary systems. For example, in the late 1970s, the U.S. National Agricultural Statistics Service spent what is conservatively estimated as 50 staff-years to develop a state-of-the-art system (Day, 1997). In addition to the past studies mentioned above, scholarly work in this area span several other academic disciplines (e.g., statistics, information systems, management sciences) as well as communities of practitioners (e.g., in electronic commerce, public health, vital records, welfare fraud detection, e-government). In this article, we present a review of recent develoment in record linkage technologies relevant to healthcare research and public health policies. The remaining of the article is organized as follows. The next section summarizes the existing literature on record linkage and the importance of record linkage in healthcare and public health. A brief introduction to different record linkage techniques is presented. Examples of successful applications of record linkage in healthcare and public health are also offered. We then discuss potential opportunities and challenges in using record linkage. The last section concludes our discussion on this topic. PasT researcH In recorD LInKage Record linkage can be applicable both within and across data sources. Typically, record linkage is defined as a computer-based process of matching two or more records from different and often heterogeneous sources of data that refer to the same entities such as persons, events, or other objects of interest. However record linkage is sometimes performed within a single data set when multiple records are present in a single database for a person or other entity (e.g., records for multiple hospitalizations in a hospital discharge data set for a 12-month period). Record linkage within a single data set is also performed to remove duplicate records, referred to as “deduplication” (Winkler, 1999). There are many applications of record linkage in both public and private sectors and its use has become even more significant with advances in the underlying techniques and the implementation tools. Detailed technical descriptions of record linkage are available elsewhere (Fair, 1995, 1997; Newcombe, 1994). In addition to applications in health care and public health, record linkage is widely employed in other fields. For example, Probert, Semenciw, Mao, and Gentleman (1997) described how record linkage was used to integrate immigration and mortality databases in Canada. Quass and Starkey (2003), White (1997), and 17 more pages are available in the full version of this document, which may be purchased using the "Add to Cart" button on the product's webpage: www.igi-global.com/article/record-linkagehealthcare/51683?camid=4v1 This title is available in InfoSci-Journals, InfoSci-Journal Disciplines Medicine, Healthcare, and Life Science. Recommend this product to your librarian: www.igi-global.com/e-resources/libraryrecommendation/?id=2

[1]  Martha E. Fair RECORD LINKAGE IN AN INFORMATION AGE SOCIETY , 1996 .

[2]  M. Khoury,et al.  Risk of childhood cancer for infants with birth defects. I. A record-linkage study, Atlanta, Georgia, 1968-1988. , 1993, American journal of epidemiology.

[3]  H B NEWCOMBE,et al.  Automatic linkage of vital records. , 1959, Science.

[4]  Raymond J. Mooney,et al.  Adaptive duplicate detection using learnable string similarity measures , 2003, KDD '03.

[5]  L. Cannon-Albright,et al.  A genealogical assessment of heritable predisposition to asthma mortality. , 2007, American journal of respiratory and critical care medicine.

[6]  Shiliang Liu,et al.  Risk of Maternal Postpartum Readmission Associated With Mode of Delivery , 2005, Obstetrics and Gynecology.

[7]  Sven Cnattingius,et al.  Mortality and cancer incidence among individuals with Down syndrome. , 2003, Archives of internal medicine.

[8]  M J Goldacre,et al.  Risk of multiple sclerosis after head injury: record linkage study , 2005, Journal of Neurology, Neurosurgery & Psychiatry.

[9]  Sumit Sarkar,et al.  A Framework for Reconciling Attribute Values from Multiple Data Sources , 2007, Manag. Sci..

[10]  Peter Christen,et al.  Towards Automated Record Linkage , 2006, AusDM.

[11]  E P Steinberg,et al.  Hospital readmissions in the Medicare population. , 1984, The New England journal of medicine.

[12]  M. Varner,et al.  Episiotomy and Obstetric Trauma in Nevada: Evidence from Linked Hospital Discharge and Birth Data , 2007 .

[13]  P. McElduff,et al.  Readmission after hysterectomy and prophylactic low molecular weight heparin: retrospective case-control study , 2006, BMJ : British Medical Journal.

[14]  Rathindra Sarathy,et al.  Secure and useful data sharing , 2006, Decis. Support Syst..

[15]  Ahmed K. Elmagarmid,et al.  TAILOR: a record linkage toolbox , 2002, Proceedings 18th International Conference on Data Engineering.

[16]  Hsiu-Ju Chang,et al.  Gender differences in healthcare service utilisation 1 year before suicide: national record linkage study. , 2009, The British journal of psychiatry : the journal of mental science.

[17]  Craig A. Knoblock,et al.  Learning object identification rules for information integration , 2001, Inf. Syst..

[18]  D. Clark,et al.  Practical introduction to record linkage for injury research , 2004, Injury Prevention.

[19]  S. Edge,et al.  Claims data linked to hospital registry data enhance evaluation of the quality of care of breast cancer , 2010, Journal of surgical oncology.

[20]  William E. Winkler,et al.  The State of Record Linkage and Current Research Problems , 1999 .

[21]  S. Kisely,et al.  Mortality in individuals who have had psychiatric treatment , 2005, British Journal of Psychiatry.

[22]  Neeraj Sood,et al.  Impact of Postpartum Hospital-Stay Legislation on Newborn Length of Stay, Readmission, and Mortality in California , 2006, Pediatrics.

[23]  S. Oddie,et al.  Early discharge and readmission to hospital in the first month of life in the Northern Region of the UK during 1998: a case cohort study , 2005, Archives of Disease in Childhood.

[24]  Dennis Deck,et al.  Record linkage software in the public domain: a comparison of Link Plus, The Link King, and a `basic' deterministic algorithm , 2008, Health Informatics J..

[25]  Sumit Sarkar,et al.  A Distance-Based Approach to Entity Reconciliation in Heterogeneous Databases , 2002, IEEE Trans. Knowl. Data Eng..

[26]  Daijin Kim,et al.  Case Report: Opportunities for Electronic Health Record Data to Support Business Functions in the Pharmaceutical Industry - A Case Study from Pfizer, Inc , 2008, J. Am. Medical Informatics Assoc..

[27]  Ivan P. Fellegi,et al.  A Theory for Record Linkage , 1969 .

[28]  Charles W. Given,et al.  Medicaid, Medicare, and the Michigan Tumor Registry: A Linkage Strategy , 2007, Medical decision making : an international journal of the Society for Medical Decision Making.

[29]  Anil Sethi,et al.  Matching records in a national medical patient index , 2001, CACM.

[30]  J Emery,et al.  The challenge of integrating genetic medicine into primary care , 2001, BMJ : British Medical Journal.

[31]  Dallan Quass,et al.  Record Linkage for Genealogical Databases , 2003 .

[32]  G. Escobar,et al.  Rehospitalisation after birth hospitalisation: patterns among infants of all gestations , 2005, Archives of Disease in Childhood.

[33]  C. Weel,et al.  The use of routinely collected computer data for research in primary care: opportunities and challenges. , 2006, Family practice.

[34]  Bernard Friedman,et al.  Racial/ethnic disparities in potentially preventable readmissions: the case of diabetes. , 2005, American journal of public health.

[35]  Sara Rosenbaum,et al.  How common are electronic health records in the United States? A summary of the evidence. , 2006, Health affairs.

[36]  Ken Flegel Getting to the electronic medical record , 2008, Canadian Medical Association Journal.

[37]  T. Blakely,et al.  Probabilistic record linkage and a method to calculate the positive predictive value. , 2002, International journal of epidemiology.

[38]  Matthew A. Jaro,et al.  Advances in Record-Linkage Methodology as Applied to Matching the 1985 Census of Tampa, Florida , 1989 .

[39]  George V. Moustakides,et al.  A Bayesian decision model for cost optimal record matching , 2003, The VLDB Journal.

[40]  Linda J. Scheetz,et al.  Evaluation of injury databases as a preliminary step to developing a triage decision rule. , 2008, Journal of nursing scholarship : an official publication of Sigma Theta Tau International Honor Society of Nursing.

[41]  G. Shah,et al.  Lessons learned in using hospital discharge data for state and national public health surveillance: implications for Centers for Disease Control and prevention tracking program. , 2008, Journal of public health management and practice : JPHMP.

[42]  S. Douglas,et al.  Trial of a disease management program to reduce hospital readmissions of the chronically critically ill. , 2005, Chest.

[43]  Sumit Sarkar,et al.  A Probabilistic Decision Model for Entity Matching in Heterogeneous Databases , 1998 .

[44]  J. Emery,et al.  Common hereditary cancers and implications for primary care , 2001, The Lancet.

[45]  H. Newcombe Cohorts and privacy , 1994, Cancer Causes & Control.

[46]  J. Sidorov It Ain't Necessarily So: The Electronic Health Record And The Unlikely Prospect Of Reducing Health Care Costs. , 2006, Health affairs.

[47]  Sanjay Saint,et al.  Hospital Readmission for Bronchiolitis , 2005, Clinical pediatrics.

[48]  A. Davidson,et al.  Linking Children's Health Information Systems: Clinical Care, Public Health, Emergency Medical Systems, and Schools , 2009, Pediatrics.

[49]  Robert J. Stroebel,et al.  Effect of discharge instructions on readmission of hospitalised patients with heart failure: do all of the Joint Commission on Accreditation of Healthcare Organizations heart failure core measures reflect better care? , 2006, Quality and Safety in Health Care.

[50]  K. Cofrin,et al.  Cesarean Deliveries and Newborn Injuries: Evidence from Linked Utah Birth Certificate and Inpatient Discharge Data , 2007 .

[51]  J. Benbassat,et al.  Hospital readmissions as a measure of quality of health care: advantages and limitations. , 2000, Archives of internal medicine.

[52]  K. McClanahan Balancing Good Intentions: Protecting the Privacy of Electronic Health Information , 2008 .