A Framework for Analysis, Ontological Evaluation, and Visualization in Preparation to Predictive Analytics in Pediatric Brain Tumor Research

We provide a generalizable framework for the systematic analysis of complicated, longitudinal clinical features in pediatric cancer. We use a threefold pipeline of exploratory data analysis, ontological categorization through a multi-modal data transformation process towards predictive analytics. We derive a data-driven phenotype from a subset of a sample of over 1900 brain tumor cases focused specifically on High-Grade Gliomas. We implement an analyst-friendly process to make machine learning-ready data sets based on domain ontologies ready for enumeration and vectorization. The results are clinical domain expert readable data points from 4.3 million observational events across 16,000 patient days. In this research, we address the gap in phenotypic data features by utilizing extensive harmonized observational clinical data and identify resources and specific processes for their use in rare tumor research.

[1]  Derek C Angus,et al.  Fusing Randomized Trials With Big Data: The Key to Self-learning Health Care Systems? , 2015, JAMA.

[2]  David Madigan,et al.  Multiple Self‐Controlled Case Series for Large‐Scale Longitudinal Observational Databases , 2013, Biometrics.

[3]  O. Bathe Molecular determinants of outcomes: Linking tissue banks to outcomes databases , 2009, Journal of surgical oncology.

[4]  Fay Betsou,et al.  Biobanking for better healthcare , 2008, Molecular oncology.

[5]  Patrick B Ryan,et al.  The impact of standardizing the definition of visits on the consistency of multi-database observational health research , 2015, BMC Medical Research Methodology.

[6]  David T. W. Jones,et al.  Pediatric high-grade glioma: biologically and clinically in need of new thinking , 2016, Neuro-oncology.

[7]  Brian Macisaac,et al.  Common data model , 1999 .

[8]  David Madigan,et al.  Disproportionality methods for pharmacovigilance in longitudinal observational databases , 2013, Statistical methods in medical research.

[9]  Xiaohua Hu,et al.  Preliminary exploratory data analysis of simulated national clinical data research network for future use in annotation of a rare tumor biobanking initiative , 2017, 2017 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[10]  A. Greenberg,et al.  Next-generation phenotyping: requirements and strategies for enhancing our understanding of genotype–phenotype relationships and its relevance to crop improvement , 2013, Theoretical and Applied Genetics.

[11]  Yu-Chuan Li,et al.  Observational Health Data Sciences and Informatics (OHDSI): Opportunities for Observational Researchers , 2015, MedInfo.

[12]  G. Niklas Norén,et al.  Temporal pattern discovery in longitudinal electronic patient records , 2010, Data Mining and Knowledge Discovery.

[13]  R. Horwitz The planning of observational studies of human populations , 1979 .

[14]  D. Chalmers Genetic research and biobanks. , 2011, Methods in molecular biology.

[15]  Subha Madhavan,et al.  An informatics research agenda to support precision medicine: seven key areas , 2016, J. Am. Medical Informatics Assoc..

[16]  Julie-Gai B Harris,et al.  Clinical informatics: a workforce priority for 21st century healthcare. , 2011, Australian health review : a publication of the Australian Hospital Association.

[17]  Patrick B Ryan,et al.  Design and validation of a data simulation model for longitudinal healthcare data. , 2011, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[18]  George Hripcsak,et al.  Next-generation phenotyping of electronic health records , 2012, J. Am. Medical Informatics Assoc..

[19]  David Madigan,et al.  Empirical Performance of the Calibrated Self-Controlled Cohort Analysis Within Temporal Pattern Discovery: Lessons for Developing a Risk Identification and Analysis System , 2013, Drug Safety.

[20]  Jimeng Sun,et al.  Multi-layer Representation Learning for Medical Concepts , 2016, KDD.

[21]  Muin J Khoury,et al.  The emergence of epidemiology in the genomics age. , 2004, International journal of epidemiology.

[22]  Douglas E. Faries,et al.  Analysis of Observational Health Care Data Using SAS , 2010 .

[23]  Keith Marsolo,et al.  PEDSnet: a National Pediatric Learning Health System , 2014, J. Am. Medical Informatics Assoc..

[24]  Allison P. Heath,et al.  Pediatric High Grade Glioma Resources From the Children’s Brain Tumor Tissue Consortium (CBTTC) and Pediatric Brain Tumor Atlas (PBTA) , 2019, bioRxiv.

[25]  D. Madigan,et al.  A Systematic Statistical Approach to Evaluating Evidence from Observational Studies , 2014 .

[26]  Alex S. Felmeister,et al.  A longitudinal footprint of genetic epilepsies using automated electronic medical record interpretation , 2020, Genetics in Medicine.

[27]  Dominique Brodbeck,et al.  Research directions in data wrangling: Visualizations and transformations for usable and credible data , 2011, Inf. Vis..

[28]  Xiaoqian Jiang,et al.  A Predictive Model for Medical Events Based on Contextual Embedding of Temporal Sequences , 2016, JMIR medical informatics.

[29]  Michael Seid,et al.  PEDSnet: how a prototype pediatric learning health system is being expanded into a national network. , 2014, Health affairs.

[30]  J. Aronson,et al.  Evidence of Misclassification of Drug–Event Associations Classified as Gold Standard ‘Negative Controls’ by the Observational Medical Outcomes Partnership (OMOP) , 2016, Drug Safety.

[31]  Praveen R. Rao,et al.  An alternative database approach for management of SNOMED CT and improved patient data queries , 2015, J. Biomed. Informatics.

[32]  N. Bolger,et al.  Within-subject mediation analysis for experimental data in cognitive psychology and neuroscience , 2017, Behavior Research Methods.

[33]  Sebastian Schneeweiss,et al.  Variable Selection for Confounding Adjustment in High-dimensional Covariate Spaces When Analyzing Healthcare Databases , 2017, Epidemiology.

[34]  Ian Foster,et al.  Personalized Biomedical Data Integration , 2011 .

[35]  K. Wilson,et al.  Clinical Knowledge from Observational Studies. Everything You Wanted to Know but Were Afraid to Ask , 2018, American journal of respiratory and critical care medicine.

[36]  Gudmundur A. Thorisson,et al.  Genotype–phenotype databases: challenges and solutions for the post-genomic era , 2009, Nature Reviews Genetics.

[37]  Jianying Hu,et al.  Towards Personalized Medicine: Leveraging Patient Similarity and Drug Similarity Analytics , 2014, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[38]  Patrick McConnell,et al.  The cancer translational research informatics platform , 2008, BMC Medical Informatics Decis. Mak..

[39]  G. Poissonnet,et al.  Head and neck adenoid cystic carcinoma: A prospective multicenter REFCOR study of 95 cases. , 2016, European annals of otorhinolaryngology, head and neck diseases.

[40]  Harry Hochheiser,et al.  An information model for computable cancer phenotypes , 2016, BMC Medical Informatics and Decision Making.

[41]  Gil Alterovitz,et al.  Seeing the forest through the trees: uncovering phenomic complexity through interactive network visualization , 2015, J. Am. Medical Informatics Assoc..

[42]  David S. Ebert,et al.  Data Transformations and Representations for Computation and Visualization , 2009, Inf. Vis..