Survival analysis with electronic health record data: Experiments with chronic kidney disease

This article presents a detailed survival analysis for chronic kidney disease (CKD). The analysis is based on the electronic health record (EHR) data comprising almost two decades of clinical observations collected at New York-Presbyterian, a large hospital in New York City with one of the oldest electronic health records in the United States. Our survival analysis approach centers around Bayesian multiresolution hazard modeling, with an objective to capture the changing hazard of CKD over time, adjusted for patient clinical covariates and kidney-related laboratory tests. Special attention is paid to statistical issues common to all EHR data, such as cohort definition, missing data and censoring, variable selection, and potential for joint survival and longitudinal modeling, all of which are discussed alone and within the EHR CKD context.

[1]  N. Hjort Nonparametric Bayes Estimators Based on Beta Processes in Models for Life History Data , 1990 .

[2]  Henry C. Chueh,et al.  Presence of key findings in the medical record prior to a documented high-risk diagnosis , 2012, J. Am. Medical Informatics Assoc..

[3]  Nancy Fink,et al.  The Timing of Specialist Evaluation in Chronic Kidney Disease and Mortality , 2002, Annals of Internal Medicine.

[4]  Noémie Elhadad,et al.  A hybrid knowledge-based and data-driven approach to identifying semantically similar concepts , 2012, J. Biomed. Informatics.

[5]  G. Schwarz Estimating the Dimension of a Model , 1978 .

[6]  Anestis Antoniadis,et al.  Density and hazard rate estimation for right‐censored data by using wavelet methods , 1999 .

[7]  Christine Wiedinmyer,et al.  The Role of Weather in Meningitis Outbreaks in Navrongo, Ghana: A Generalized Additive Modeling Approach , 2012, Journal of Agricultural, Biological, and Environmental Statistics.

[8]  J. Górriz,et al.  Proteinuria: detection and role in native renal disease progression. , 2012, Transplantation reviews.

[9]  Vanja Dukić,et al.  A Multiresolution Hazard Model for Multicenter Survival Studies , 2007, Journal of the American Statistical Association.

[10]  Y. Li,et al.  Vitamin D: roles in renal and cardiovascular protection , 2012, Current opinion in nephrology and hypertension.

[11]  Hans L Hillege,et al.  Microalbuminuria and risk of venous thromboembolism. , 2009, JAMA.

[12]  Sharon-Lise T. Normand,et al.  Statistical and Clinical Aspects of Hospital Outcomes Profiling , 2007, 0710.4622.

[13]  Vanja Dukic,et al.  Hazard of recurrence and adjuvant treatment effects over time in lymph node-negative breast cancer , 2009, Breast Cancer Research and Treatment.

[14]  Peter J. Haug,et al.  Exploiting missing clinical data in Bayesian network modeling for predicting medical problems , 2008, J. Biomed. Informatics.

[15]  C Gatsonis,et al.  Meta‐analysis of Diagnostic Test Accuracy Assessment Studies with Varying Number of Thresholds , 2003, Biometrics.

[16]  Wayne B. Nelson,et al.  Applied Life Data Analysis: Nelson/Applied Life Data Analysis , 2005 .

[17]  J. van der Lei,et al.  Use and abuse of computer-stored medical records. , 1991 .

[18]  George Hripcsak,et al.  Research Paper: Knowledge-based Approaches to the Maintenance of a Large Controlled Medical Terminology , 1994, J. Am. Medical Informatics Assoc..

[19]  Robert Gray,et al.  Flexible Methods for Analyzing Survival Data Using Splines, with Applications to Breast Cancer Prognosis , 1992 .

[20]  Roderick J. A. Little,et al.  Statistical Analysis with Missing Data: Little/Statistical Analysis with Missing Data , 2002 .

[21]  Ken P Kleinman,et al.  Identifying pediatric age groups for influenza vaccination using a real-time regional surveillance system. , 2005, American journal of epidemiology.

[22]  Jane-Ling Wang,et al.  Modeling Longitudinal Data with Nonparametric Multiplicative Random Effects Jointly with Survival Data , 2008, Biometrics.

[23]  David W. Bates,et al.  Position Paper: A Proposal for Electronic Medical Records in U.S. Primary Care , 2003, J. Am. Medical Informatics Assoc..

[24]  Clement J. McDonald,et al.  Standardizing clinical laboratory data for secondary use , 2012, J. Biomed. Informatics.

[25]  Noémie Elhadad,et al.  Lessons Learned in Replicating Data-Driven Experiments in Multiple Medical Systems and Patient Populations , 2013, AMIA.

[26]  Michael M. Wagner,et al.  Review: Accuracy of Data in Computer-based Patient Records , 1997, J. Am. Medical Informatics Assoc..

[27]  David K. Vawdrey,et al.  Under-documentation of chronic kidney disease in the electronic health record in outpatients , 2010, J. Am. Medical Informatics Assoc..

[28]  Riten Mitra,et al.  Bayesian Nonparametric Inference - Why and How. , 2013, Bayesian analysis.

[29]  D. Blumenthal,et al.  Achieving a Nationwide Learning Health System , 2010, Science Translational Medicine.

[30]  I. Kohane,et al.  Extracting Physician Group Intelligence from Electronic Health Records to Support Evidence Based Medicine , 2013, PloS one.

[31]  R. Gray Some diagnostic methods for Cox regression models through hazard smoothing. , 1990, Biometrics.

[32]  Russ B. Altman,et al.  The utility of general purpose versus specialty clinical databases for research: Warfarin dose estimation from extracted clinical variables , 2010, J. Biomed. Informatics.

[33]  N M Laird,et al.  Model-based approaches to analysing incomplete longitudinal and failure time data. , 1997, Statistics in medicine.

[34]  R. Little Pattern-Mixture Models for Multivariate Incomplete Data , 1993 .

[35]  T. Hastie,et al.  Improved Semiparametric Time Series Models of Air Pollution and Mortality , 2004 .

[36]  Stephen G. Walker,et al.  Markov beta and gamma processes for modelling hazard rates , 2002 .

[37]  M. Glickman,et al.  Statistical Methods for Profiling Providers of Medical Care: Issues and Applications , 1997 .

[38]  Vanja Dukić,et al.  Bayesian Hierarchical Multiresolution Hazard Model for the Study of Time-Dependent Failure Patterns in Early Stage Breast Cancer. , 2007, Bayesian analysis.

[39]  Melissa A. Basford,et al.  Validation of electronic medical record-based phenotyping algorithms: results and lessons learned from the eMERGE network. , 2013, Journal of the American Medical Informatics Association : JAMIA.

[40]  G. Hripcsak,et al.  Estimation of time-delayed mutual information and bias for irregularly and sparsely sampled time-series. , 2011, Chaos, solitons, and fractals.

[41]  E. Kolaczyk Bayesian Multiscale Models for Poisson Processes , 1999 .

[42]  Joseph G. Ibrahim,et al.  BAYESIAN METHODS FOR JOINT MODELING OF LONGITUDINAL AND SURVIVAL DATA WITH APPLICATIONS TO CANCER VACCINE TRIALS , 2004 .

[43]  A. Kengne,et al.  Risk Models to Predict Chronic Kidney Disease and Its Progression: A Systematic Review , 2012, PLoS medicine.

[44]  George Hripcsak,et al.  Using time-delayed mutual information to discover and interpret temporal correlation structure in complex populations , 2011, Chaos.

[45]  K M Leung,et al.  Censoring issues in survival analysis. , 1997, Annual review of public health.

[46]  R. Gray Hazard Rate Regression Using Ordinary Nonparametric Regression Smoothers , 1996 .

[47]  Sarah A Collins,et al.  "Reading between the lines" of flow sheet data: nurses' optional documentation associated with cardiac arrest outcomes. , 2012, Applied nursing research : ANR.

[48]  Noémie Elhadad,et al.  Identifying and mitigating biases in EHR laboratory tests , 2014, J. Biomed. Informatics.

[49]  D. Dey,et al.  Semiparametric Bayesian analysis of survival data , 1997 .

[50]  George Hripcsak,et al.  Exploiting time in electronic health record correlations , 2011, J. Am. Medical Informatics Assoc..

[51]  Caleb W. Hug,et al.  Predicting the risk and trajectory of intensive care patients using survival models , 2006 .

[52]  A. Levey,et al.  A More Accurate Method To Estimate Glomerular Filtration Rate from Serum Creatinine: A New Prediction Equation , 1999, Annals of Internal Medicine.

[53]  R. Gansevoort,et al.  Development and validation of a general population renal risk score. , 2011, Clinical journal of the American Society of Nephrology : CJASN.

[54]  Harold I Feldman,et al.  CKD in Hispanics: Baseline characteristics from the CRIC (Chronic Renal Insufficiency Cohort) and Hispanic-CRIC Studies. , 2011, American journal of kidney diseases : the official journal of the National Kidney Foundation.