Evaluation of record linkage between a large healthcare provider and the Utah Population Database

OBJECTIVE Electronically linked datasets have become an important part of clinical research. Information from multiple sources can be used to identify comorbid conditions and patient outcomes, measure use of healthcare services, and enrich demographic and clinical variables of interest. Innovative approaches for creating research infrastructure beyond a traditional data system are necessary. MATERIALS AND METHODS Records from a large healthcare system's enterprise data warehouse (EDW) were linked to a statewide population database, and a master subject index was created. The authors evaluate the linkage, along with the impact of missing information in EDW records and the coverage of the population database. The makeup of the EDW and population database provides a subset of cancer records that exist in both resources, which allows a cancer-specific evaluation of the linkage. RESULTS About 3.4 million records (60.8%) in the EDW were linked to the population database with a minimum accuracy of 96.3%. It was estimated that approximately 24.8% of target records were absent from the population database, which enabled the effect of the amount and type of information missing from a record on the linkage to be estimated. However, 99% of the records from the oncology data mart linked; they had fewer missing fields and this correlated positively with the number of patient visits. DISCUSSION AND CONCLUSION A general-purpose research infrastructure was created which allows disease-specific cohorts to be identified. The usefulness of creating an index between institutions is that it allows each institution to maintain control and confidentiality of their own information.

[1]  J. Warren,et al.  Overview of the SEER—Medicare Health Outcomes Survey Linked Dataset , 2008, Health care financing review.

[2]  L R Goldin,et al.  Optimal ascertainment strategies to detect linkage to common disease alleles. , 1998, American journal of human genetics.

[3]  K. Devers,et al.  Health services research and data linkages: issues, methods, and directions for the future. , 2010, Health services research.

[4]  R. Hays,et al.  Impact of cancer on health-related quality of life of older Americans. , 2009, Journal of the National Cancer Institute.

[5]  S. Cnattingius,et al.  Prenatal and perinatal risk factors for neuroblastoma , 2008, International journal of cancer.

[6]  John D Boice,et al.  Second cancers among 40,576 testicular cancer patients: focus on long-term survivors. , 2005, Journal of the National Cancer Institute.

[7]  Margaret Robertson,et al.  Identification and characterization of the familial adenomatous polyposis coli gene , 1991, Cell.

[8]  R. McClure,et al.  Population health and clinical data linkage: the importance of a population registry , 2007, Australian and New Zealand journal of public health.

[9]  Marc S. Williams,et al.  Inflammatory bowel disease aggregation in Utah kindreds , 2011, Inflammatory bowel diseases.

[10]  Lonnie Blevins,et al.  The Indiana network for patient care: a working local health information infrastructure. An example of a working infrastructure collaboration that links data from five health systems and hundreds of millions of entries. , 2005, Health affairs.

[11]  C. Earle,et al.  Surgical mortality in patients with esophageal cancer: development and validation of a simple risk score. , 2006, Journal of clinical oncology : official journal of the American Society of Clinical Oncology.

[12]  Chien-Jen Chen,et al.  Decreased incidence of hepatocellular carcinoma in hepatitis B vaccinees: a 20-year follow-up study. , 2009, Journal of the National Cancer Institute.

[13]  L. Cannon-Albright,et al.  Familiality of diabetes mellitus. , 2007, Experimental and clinical endocrinology & diabetes : official journal, German Society of Endocrinology [and] German Diabetes Association.

[14]  Steven E. Bayer,et al.  A strong candidate for the breast and ovarian cancer susceptibility gene BRCA1. , 1994, Science.

[15]  Judith Rankin,et al.  Congenital anomaly and childhood cancer: A population‐based, record linkage study , 2008, Pediatric blood & cancer.

[16]  Sara Ellis Simonsen,et al.  Estimating Recurrence of Spontaneous Preterm Delivery , 2008, Obstetrics and gynecology.

[17]  Scott L. DuVall,et al.  The Impact of a Growing Minority Population on Identification of Duplicate Records in an Enterprise Data Warehouse , 2010, MedInfo.

[18]  Kerina H. Jones,et al.  The SAIL databank: linking multiple health and social care datasets , 2009, BMC Medical Informatics Decis. Mak..

[19]  Matthew A. Jaro,et al.  Probabilistic linkage of large public health data files. , 1995, Statistics in medicine.

[20]  Ikuho Yamada,et al.  Walkability and body mass index density, design, and new diversity measures. , 2008, American journal of preventive medicine.

[21]  Bingshu E. Chen,et al.  Second cancer incidence and cause-specific mortality among 3104 patients with hairy cell leukemia: a population-based study. , 2007, Journal of the National Cancer Institute.

[22]  J. Olsen,et al.  Hospitalizations among children of survivors of childhood and adolescent cancer: A population‐based cohort study , 2010, International journal of cancer.

[23]  L. Almasy,et al.  Multipoint quantitative-trait linkage analysis in general pedigrees. , 1998, American journal of human genetics.

[24]  P. Hartge,et al.  Epidemiology of brain lymphoma among people with or without acquired immunodeficiency syndrome. AIDS/Cancer Study Group. , 1996, Journal of the National Cancer Institute.

[25]  G. Abecasis,et al.  A general test of association for quantitative traits in nuclear families. , 2000, American journal of human genetics.

[26]  V. Brouste,et al.  Population-Based Study of Peritumoral Lymphovascular Invasion and Outcome Among Patients With Operable Breast Cancer , 2010 .

[27]  Geraldine P. Mineau,et al.  A computerized family history data base system , 1979 .

[28]  N. Camp,et al.  Statistical recombinant mapping in extended high‐risk Utah pedigrees narrows the 8q24 prostate cancer locus to 2.0 Mb , 2007, The Prostate.

[29]  M. Skolnick,et al.  Assignment of a locus for familial melanoma, MLM, to chromosome 9p13-p22. , 1992, Science.

[30]  L. Clegg,et al.  Impact of socioeconomic status on cancer incidence and stage at diagnosis: selected findings from the surveillance, epidemiology, and end results: National Longitudinal Mortality Study , 2008, Cancer Causes & Control.

[31]  Deborah Schrag,et al.  Annual report to the nation on the status of cancer, 1975-2002, featuring population-based trends in cancer treatment. , 2005, Journal of the National Cancer Institute.

[32]  Kathleen Lang,et al.  Identifying Cancer Relapse Using SEER-Medicare Data , 2002, Medical care.

[33]  M. Leppert,et al.  Colonic Adenoma Risk in Familial Colorectal Cancer—A Study of Six Extended Kindreds , 2008, The American Journal of Gastroenterology.

[34]  T. Wong,et al.  Familial aggregation of age-related macular degeneration in the Utah population , 2008, Vision Research.

[35]  C. McGahan,et al.  Cervical intraepithelial neoplasia outcomes after treatment: long-term follow-up from the British Columbia Cohort Study. , 2009, Journal of the National Cancer Institute.

[36]  Ken R. Smith,et al.  Effects of childhood and middle-adulthood family conditions on later-life mortality: evidence from the Utah Population Database, 1850-2002. , 2009, Social science & medicine.

[37]  Bert Brunekreef,et al.  Long-Term Exposure to Traffic-Related Air Pollution and Lung Cancer Risk , 2008, Epidemiology.

[38]  Geraldine P Mineau,et al.  Biomedical databases: protecting privacy and promoting research. , 2003, Trends in biotechnology.

[39]  B. Rasmussen,et al.  Population-based study of peritumoral lymphovascular invasion and outcome among patients with operable breast cancer. , 2009, Journal of the National Cancer Institute.

[40]  R. Kerber,et al.  A cohort study of cancer risk in relation to family histories of cancer in the Utah population database , 2005, Cancer.

[41]  N. Camp,et al.  Lobular breast cancer: Excess familiality observed in the Utah Population Database , 2005, International journal of cancer.

[42]  S. Seal,et al.  Localization of a breast cancer susceptibility gene, BRCA2, to chromosome 13q12-13. , 1994, Science.

[43]  S. Scott,et al.  Information Gained From Linking SEER Cancer Registry Data to State-Level Hospital Discharge Abstracts , 2000, Medical care.

[44]  David K. Wyant,et al.  Linked insurance-tumor registry database for health services research. , 1999, Medical care.

[45]  E. Lamont,et al.  Patient time costs associated with cancer care. , 2007, Journal of the National Cancer Institute.

[46]  Sidney N. Thornton,et al.  Reducing Duplicate Patient Creation Using a Probabilistic Matching Algorithm in an Open-access Community Data Sharing Environment , 2005, AMIA.