Machine Learning-Based Modeling of Big Clinical Trials Data for Adverse Outcome Prediction: A Case Study of Death Events

It is known that clinical trials have potential risks for participants, which could result in unexpected adverse events. To quantify and predict the risk of adverse outcomes, we leverage a large amount of clinical reports to build machine learning models to predict adverse outcomes. We focused on death events as the predicting target in this study. From Clinicaltrial.gov, we collected 28,340 reports and transformed the data into vectorized machine learning features. These features were harmonized across studies using semantic mapping and feature selection techniques. The resulting selected clinical trial features were used to build five machine learning models for prediction. We evaluated and compared relative model performances for the prediction task. Results show that the logistic regression algorithm achieved the best overall receiver operating characteristic score at 0.7344. This exploratory study showed that it is feasible to use clinical trial factors to predict adverse outcomes. We demonstrated the approach by focusing on building machine learning models to predict death outcomes. Predicting adverse outcomes could help clinical trials estimate harmful risks and design better mechanisms to protect participants. We hope by using our models, a clinical trial expert will be able to assess whether serious adverse events are likely to occur in a clinical trial at the early stage and to estimate what potential trial factors could contribute to the potential serious adverse events.

[1]  J. Matthews,et al.  Randomization in Clinical Trials: Theory and Practice; , 2003 .

[2]  Serious adverse drug events related to non-investigational drugs in academic clinical trials: another source of safety data for risk assessment? , 2016, British journal of clinical pharmacology.

[3]  John Whitehead,et al.  Evaluating Clinical Trial Designs for Investigational Treatments of Ebola Virus Disease , 2015, PLoS medicine.

[4]  D. DeMets,et al.  Fundamentals of Clinical Trials , 1982 .

[5]  Christina Eldredge,et al.  Population Analysis of Adverse Events in Different Age Groups Using Big Clinical Trials Data , 2016, JMIR medical informatics.

[6]  J. Hanley,et al.  The meaning and use of the area under a receiver operating characteristic (ROC) curve. , 1982, Radiology.

[7]  W. Marsden I and J , 2012 .

[8]  M. Unnikrishnan Eminence or Evidence? The Volatility, Uncertainty, Complexity, and Ambiguity in Healthcare , 2017, Journal of pharmacology & pharmacotherapeutics.

[9]  David Page,et al.  Area under the Precision-Recall Curve: Point Estimates and Confidence Intervals , 2013, ECML/PKDD.

[10]  Mph Dr. Syed Rizwanuddin Ahmad MD Adverse drug event monitoring at the food and drug administration , 2007, Journal of General Internal Medicine.

[11]  Cyril Ferdynus,et al.  A Comparison of a Machine Learning Model with EuroSCORE II in Predicting Mortality after Elective Cardiac Surgery: A Decision Curve Analysis , 2017, PloS one.

[12]  Rong Xu,et al.  Mining Patterns of Adverse Events Using Aggregated Clinical Trial Results , 2013, AMIA Joint Summits on Translational Science proceedings. AMIA Joint Summits on Translational Science.

[13]  William Fleischman,et al.  Prediction of In-hospital Mortality in Emergency Department Patients With Sepsis: A Local Big Data-Driven, Machine Learning Approach. , 2016, Academic emergency medicine : official journal of the Society for Academic Emergency Medicine.

[14]  Z. Obermeyer,et al.  Predicting the Future - Big Data, Machine Learning, and Clinical Medicine. , 2016, The New England journal of medicine.

[15]  I. Edwards,et al.  Adverse drug reactions: definitions, diagnosis, and management , 2000, The Lancet.

[16]  Sulabha S. Apte,et al.  Improved Study of Heart Disease Prediction System using Data Mining Classification Techniques , 2012 .

[17]  Gari D. Clifford,et al.  Shortliffe Edward H, Cimino James J: "Biomedical Informatics; Computer Applications in Health Care and Biomedicine" , 2006 .

[18]  Constantin F. Aliferis,et al.  An evaluation of machine-learning methods for predicting pneumonia mortality , 1997, Artif. Intell. Medicine.

[19]  Ben Goldacre,et al.  Why clinical trial outcomes fail to translate into benefits for patients , 2017, Trials.

[20]  Shamsher Bahadur Patel,et al.  A Literature Review in Health Informatics Using Data Mining Techniques , 2014 .

[21]  P. Cochat,et al.  Et al , 2008, Archives de pediatrie : organe officiel de la Societe francaise de pediatrie.

[22]  Pardis Sabeti,et al.  Transforming Clinical Data into Actionable Prognosis Models: Machine-Learning Framework and Field-Deployable App to Predict Outcome of Ebola Patients , 2016, PLoS neglected tropical diseases.