TrialGraph: Machine Intelligence Enabled Insight from Graph Modelling of Clinical Trials

A major impediment to successful drug development is the complexity, cost, and scale of clinical trials. The detailed internal structure of clinical trial data can make conventional optimization difficult to achieve. Recent advances in machine learning, specifically graph-structured data analysis, have the potential to enable significant progress in improving clinical trial design. TrialGraph seeks to apply these methodologies to produce a proof-of-concept framework for developing models which can aid drug development and benefit patients. In this work, we first introduce a curated clinical trial data set compiled from the CT.gov, AACT and TrialTrove databases (n=1191 trials; representing one million patients) and describe the conversion of this data to graph-structured formats. We then detail the mathematical basis and implementation of a selection of graph machine learning algorithms, which typically use standard machine classifiers on graph data embedded in a low-dimensional feature space. We trained these models to predict side effect information for a clinical trial given information on disease, existing medical conditions, and treatment. The MetaPath2Vec algorithm performed exceptionally well, with standard Logistic Regression, Decision Tree, Random Forest, Support Vector, and Neural Network classifiers exhibiting typical ROC-AUC scores of 0.85, 0.68, 0.86, 0.80, and 0.77, respectively. Remarkably, the best performing classifiers could only produce typical ROC-AUC scores of 0.70 when trained on equivalent array-structured data. Our work demonstrates that graph modelling can significantly improve prediction accuracy on appropriate datasets. Successive versions of the project that refine modelling assumptions and incorporate more data types can produce excellent predictors with real-world applications in drug development.

[1]  A. Barabasi,et al.  Network biology: understanding the cell's functional organization , 2004, Nature Reviews Genetics.

[2]  P. Shannon,et al.  Cytoscape: a software environment for integrated models of biomolecular interaction networks. , 2003, Genome research.

[3]  Tianshu Zhou,et al.  EHR-Oriented Knowledge Graph System: Toward Efficient Utilization of Non-Used Information Buried in Routine Clinical Practice , 2021, IEEE Journal of Biomedical and Health Informatics.

[4]  Kipp W. Johnson,et al.  Enabling Precision Cardiology Through Multiscale Biology and Systems Medicine , 2017, JACC. Basic to translational science.

[5]  Emmette R. Hutchison,et al.  The role of machine learning in clinical research: transforming the future of evidence generation , 2021, Trials.

[6]  Jure Leskovec,et al.  Inductive Representation Learning on Large Graphs , 2017, NIPS.

[7]  Jyothish Soman,et al.  Utilizing graph machine learning within drug discovery and development , 2021, Briefings Bioinform..

[8]  Andreas Bender,et al.  A Review of Biomedical Datasets Relating to Drug Discovery: A Knowledge Graph Perspective , 2021, Briefings in bioinformatics.

[9]  Michalis Vazirgiannis,et al.  GraKeL: A Graph Kernel Library in Python , 2018, J. Mach. Learn. Res..

[10]  S. Nampally,et al.  Can machine learning augment clinician adjudication of events in cardiovascular trials? A case study of major adverse cardiovascular events (MACE) across CVRM trials , 2021, European Heart Journal.

[11]  Kyung-Ah Sohn,et al.  Knowledge boosting: a graph-based integration approach with multi-omics data and genomic knowledge for cancer clinical outcome prediction , 2014, J. Am. Medical Informatics Assoc..

[12]  Riccardo Miotto,et al.  Translational bioinformatics in the era of real-time biomedical, health care and wellness data streams , 2016, Briefings Bioinform..

[13]  David L. Birtwell,et al.  Carnival: A Graph-Based Data Integration and Query Tool to Support Patient Cohort Generation for Clinical Research , 2019, MedInfo.

[14]  Yang Xiang,et al.  COVID-19 trial graph: a linked graph for COVID-19 clinical trials , 2021, J. Am. Medical Informatics Assoc..

[15]  Oommen K. Mathew,et al.  Network Modules Driving Plant Stress Response, Tolerance and Adaptation: A case study using Abscisic acid Induced Protein-protein Interactome of Arabidopsis thaliana , 2016, bioRxiv.

[16]  Khader Shameer,et al.  Transcriptional regulatory networks in Arabidopsis thaliana during single and combined stresses , 2015, Nucleic acids research.

[17]  Qingcai Chen,et al.  Novel Graph-Based Model With Biaffine Attention for Family History Extraction From Clinical Text: Modeling Study , 2020, JMIR medical informatics.

[18]  Li Li,et al.  Comparative analyses of population-scale phenomic data in electronic medical records reveal race-specific disease networks , 2016, Bioinform..

[19]  Khader Shameer,et al.  Uncovering Machine Learning-Ready Data from Public Clinical Trial Resources: A case-study on normalization across Aggregate Content of ClinicalTrials.gov , 2020, 2020 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).

[20]  Alexander R. Pico,et al.  Cytoscape Automation: empowering workflow-based network analysis , 2019, Genome Biology.

[21]  R. W. Hansen,et al.  Journal of Health Economics , 2016 .

[22]  Srinivasan Parthasarathy,et al.  Graph embedding on biomedical networks: methods, applications and evaluations , 2019, Bioinform..

[23]  Kipp W. Johnson,et al.  A Network-Biology Informed Computational Drug Repositioning Strategy to Target Disease Risk Trajectories and Comorbidities of Peripheral Artery Disease , 2018, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[24]  Li Li,et al.  An Integrative Pipeline for Multi-Modal Discovery of Disease Relationships , 2014, Pacific Symposium on Biocomputing.

[25]  L. Malik,et al.  Increasing complexity in oncology phase I clinical trials , 2018, Investigational New Drugs.

[26]  Joel Dudley,et al.  Interpreting functional effects of coding variants: challenges in proteome-scale prediction, annotation and assessment , 2016, Briefings Bioinform..

[27]  Nitesh V. Chawla,et al.  metapath2vec: Scalable Representation Learning for Heterogeneous Networks , 2017, KDD.

[28]  K. Hao,et al.  A functional genomics predictive network model identifies regulators of inflammatory bowel disease , 2017, Nature Genetics.

[29]  Nigam H. Shah,et al.  Building the graph of medicine from millions of clinical narratives , 2014, Scientific Data.

[30]  Jeffrey R Petrella,et al.  Use of graph theory to evaluate brain networks: a clinical tool for a small world? , 2011, Radiology.

[31]  Sharon Chiang,et al.  Clinical correlates of graph theory findings in temporal lobe epilepsy , 2014, Seizure.