Preliminary exploratory data analysis of simulated national clinical data research network for future use in annotation of a rare tumor biobanking initiative

Observational data resources based on the capture of clinical data in the electronic health record (EHR) have produced significant learning opportunities in many areas of medicine. These large data resources can span multiple hospital systems and employ common semantics, ontologies, and data models. They have uncovered critical safety issues for patients, and spurred observational research and clinical decision support. In the age of precision medicine there is also an increased need to obtain genomic and clinical data to discover novel treatments for the deadliest of diseases. With this, there are efforts to create deep-dive disease specific repositories that include tissue in biobanks. The latter require significant human annotation of biospecimens. Securing the data is especially critical in rare pediatric brain tumors. In the specific case of The Children's Brain Tumor Tissue Consortium (CBTTC) an international rare pediatric brain tumor repository, the number of patients that need to be followed prospectively is outpacing the ability of human annotation. In this preliminary study, we perform a prescribed data exploration analysis on simulation data in the Observational Medical Outcomes Partnership (OMOP) Common Data Model (CDM) employed by the pediatric data network PEDSNet with the intention to ascertain feasibility in automatic annotation of patient records in the CBTTC.

[1]  Marianne K Henderson,et al.  Biospecimens and Biorepositories: From Afterthought to Science , 2012, Cancer Epidemiology, Biomarkers & Prevention.

[2]  Jimeng Sun,et al.  Clinical phenotyping in selected national networks: demonstrating the need for high-throughput, portable, and computational methods , 2016, Artif. Intell. Medicine.

[3]  Jianying Hu,et al.  Towards Personalized Medicine: Leveraging Patient Similarity and Drug Similarity Analytics , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[4]  G. Niklas Norén,et al.  Temporal pattern discovery in longitudinal electronic patient records , 2010, Data Mining and Knowledge Discovery.

[5]  David Madigan,et al.  Multiple Self‐Controlled Case Series for Large‐Scale Longitudinal Observational Databases , 2013, Biometrics.

[6]  Lin Chen,et al.  Importance of multi-modal approaches to effectively identify cataract cases from electronic health records , 2012, J. Am. Medical Informatics Assoc..

[7]  David Madigan,et al.  Empirical Performance of the Calibrated Self-Controlled Cohort Analysis Within Temporal Pattern Discovery: Lessons for Developing a Risk Identification and Analysis System , 2013, Drug Safety.

[8]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[9]  Medicaid Services,et al.  International Classification of Diseases, Ninth Revision, Clinical Modification , 2011 .

[10]  T. Lasko,et al.  Computational Phenotype Discovery Using Unsupervised Feature Learning over Noisy, Sparse, and Irregular Clinical Data , 2013, PloS one.

[11]  Jessica Boklan,et al.  Little patients, losing patience: pediatric cancer drug development , 2006, Molecular Cancer Therapeutics.

[12]  Betsy L. Humphreys,et al.  Unified Medical Language System® (UMLS®) Project , 2017 .

[13]  Xiaoqian Jiang,et al.  A Predictive Model for Medical Events Based on Contextual Embedding of Temporal Sequences , 2016, JMIR medical informatics.

[14]  J. Neely,et al.  A practical guide to understanding Kaplan-Meier curves , 2010, Otolaryngology--head and neck surgery : official journal of American Academy of Otolaryngology-Head and Neck Surgery.

[15]  J. Aronson,et al.  Evidence of Misclassification of Drug–Event Associations Classified as Gold Standard ‘Negative Controls’ by the Observational Medical Outcomes Partnership (OMOP) , 2016, Drug Safety.

[16]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[17]  Ping Zhang,et al.  Risk Prediction with Electronic Health Records: A Deep Learning Approach , 2016, SDM.

[18]  Atul J Butte,et al.  Collaborative Biomedicine in the Age of Big Data: The Case of Cancer , 2014, Journal of medical Internet research.

[19]  Patrick B Ryan,et al.  Design and validation of a data simulation model for longitudinal healthcare data. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[20]  Keith Marsolo,et al.  PEDSnet: a National Pediatric Learning Health System , 2014, J. Am. Medical Informatics Assoc..

[21]  Sebastian Schneeweiss,et al.  Variable Selection for Confounding Adjustment in High-dimensional Covariate Spaces When Analyzing Healthcare Databases , 2017, Epidemiology.

[22]  Adam C. Resnick,et al.  GENE-12. THE CHILDREN’S BRAIN TUMOR TISSUE CONSORTIUM (CBTTC) INFRASTRUCTURE FACILITATES COLLABORATIVE RESEARCH IN PEDIATRIC CENTRAL NERVOUS SYSTEM TUMORS , 2017 .

[23]  Hui Xiong,et al.  Temporal Phenotyping from Longitudinal Electronic Health Records: A Graph Based Framework , 2015, KDD.

[24]  Benjamin M Craig,et al.  Simulating the contribution of a biospecimen and clinical data repository in a phase II clinical trial: A value of information analysis , 2016, Statistical methods in medical research.

[25]  P. Lambin,et al.  Learning methods in radiation oncology ‘Rapid Learning health care in oncology’ – An approach towards decision support systems enabling customised radiotherapy’ q , 2013 .

[26]  David Madigan,et al.  Disproportionality methods for pharmacovigilance in longitudinal observational databases , 2013, Statistical methods in medical research.

[27]  Richard Platt,et al.  Launching PCORnet, a national patient-centered clinical research network , 2014, Journal of the American Medical Informatics Association : JAMIA.

[28]  Patrick B Ryan,et al.  The impact of standardizing the definition of visits on the consistency of multi-database observational health research , 2015, BMC Medical Research Methodology.