Evaluating predictive modeling algorithms to assess patient eligibility for clinical trials from routine data

BackgroundThe necessity to translate eligibility criteria from free text into decision rules that are compatible with data from the electronic health record (EHR) constitutes the main challenge when developing and deploying clinical trial recruitment support systems. Recruitment decisions based on case-based reasoning, i.e. using past cases rather than explicit rules, could dispense with the need for translating eligibility criteria and could also be implemented largely independently from the terminology of the EHR’s database. We evaluated the feasibility of predictive modeling to assess the eligibility of patients for clinical trials and report on a prototype’s performance for different system configurations.MethodsThe prototype worked by using existing basic patient data of manually assessed eligible and ineligible patients to induce prediction models. Performance was measured retrospectively for three clinical trials by plotting receiver operating characteristic curves and comparing the area under the curve (ROC-AUC) for different prediction algorithms, different sizes of the learning set and different numbers and aggregation levels of the patient attributes.ResultsRandom forests were generally among the best performing models with a maximum ROC-AUC of 0.81 (CI: 0.72-0.88) for trial A, 0.96 (CI: 0.95-0.97) for trial B and 0.99 (CI: 0.98-0.99) for trial C. The full potential of this algorithm was reached after learning from approximately 200 manually screened patients (eligible and ineligible). Neither block- nor category-level aggregation of diagnosis and procedure codes influenced the algorithms’ performance substantially.ConclusionsOur results indicate that predictive modeling is a feasible approach to support patient recruitment into clinical trials. Its major advantages over the commonly applied rule-based systems are its independency from the concrete representation of eligibility criteria and EHR data and its potential for automation.

[1]  S. Tu,et al.  Analysis of Eligibility Criteria Complexity in Clinical Trials , 2010, Summit on translational bioinformatics.

[2]  D. Weiss,et al.  Planning patient recruitment: fantasy and reality. , 1984, Statistics in medicine.

[3]  Li Li,et al.  Comparing ICD9-Encoded Diagnoses and NLP-Processed Discharge Summaries for Clinical Trials Pre-Screening: A Case Study , 2008, AMIA.

[4]  D. Hunninghake,et al.  Recruitment for controlled clinical trials: literature summary and annotated bibliography. , 1997, Controlled clinical trials.

[5]  Aziz A. Boxwala,et al.  Enhancing Arden Syntax for Clinical Trial Eligibility Criteria , 1999, AMIA.

[6]  Hans-Ulrich Prokosch,et al.  Secondary use of routinely collected patient data in a clinical trial: An evaluation of the effects on patient recruitment and data acquisition , 2013, Int. J. Medical Informatics.

[7]  Vitaly Herasevich,et al.  The accuracy and efficiency of electronic screening for recruitment into a clinical trial on COPD. , 2011, Respiratory medicine.

[8]  Blaz Zupan,et al.  Predictive data mining in clinical medicine: Current issues and guidelines , 2008, Int. J. Medical Informatics.

[9]  Chunhua Weng,et al.  Case Report: Electronic Screening Improves Efficiency in Clinical Trial Recruitment , 2009, J. Am. Medical Informatics Assoc..

[10]  Chunhua Weng,et al.  Formal representation of eligibility criteria: A literature review , 2010, J. Biomed. Informatics.

[11]  Pascal Pommier,et al.  Optimizing clinical practice with case-based reasoning approach. , 2008, Journal of evaluation in clinical practice.

[12]  R Core Team,et al.  R: A language and environment for statistical computing. , 2014 .

[13]  Mor Peleg,et al.  A practical method for transforming free-text eligibility criteria into computable criteria , 2011, J. Biomed. Informatics.

[14]  Joe Kesterson,et al.  Comparing methods for identifying pancreatic cancer patients using electronic data sources. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[15]  Jun Zhang,et al.  Automatic patient search for breast cancer clinical trials using free-text medical reports , 2010, IHI.

[16]  David Glasspool,et al.  Comparing semi-automatic systems for recruitment of patients to clinical trials , 2011, Int. J. Medical Informatics.

[17]  P. Kirchhof,et al.  Routine data from hospital information systems can support patient recruitment for clinical studies , 2010, Clinical trials.

[18]  Neil R. Smalheiser,et al.  Proceedings of the 1st ACM International Health Informatics Symposium , 2010, IHI 2010.

[19]  Hans-Ulrich Prokosch,et al.  Semantic Challenges in Database Federation: Lessons Learned , 2005, MIE.

[20]  D. Hunninghake,et al.  Recruitment experience in clinical trials: literature summary and annotated bibliography. , 1987, Controlled clinical trials.

[21]  Ronan A Lyons,et al.  The Health Informatics Trial Enhancement Project (HITE): Using routinely collected primary care data to identify potential participants for a depression trial , 2010, Trials.

[22]  Paolo Pelosi,et al.  Mortality after surgery in Europe: a 7 day cohort study , 2012, The Lancet.

[23]  Christina J. Hopfe,et al.  Natural Language Processing and Information Systems, 15th International Conference on Applications of Natural Language to Information Systems, NLDB 2010, Cardiff, UK, June 23-25, 2010. Proceedings , 2010, NLDB.

[24]  Nicolette de Keizer,et al.  The role of standardized data and terminological systems in computerized clinical decision support systems: Literature review and survey , 2011, Int. J. Medical Informatics.

[25]  Torsten Hothorn,et al.  Preoperative chemoradiotherapy and postoperative chemotherapy with fluorouracil and oxaliplatin versus fluorouracil alone in locally advanced rectal cancer: initial results of the German CAO/ARO/AIO-04 randomised phase 3 trial. , 2012, The Lancet. Oncology.

[26]  Claire Snowdon,et al.  Does it matter if clinicians recruiting for a trial don't understand what the trial is really about? Qualitative study of surgeons' experiences of participation in a pragmatic multi-centre RCT , 2007, Trials.

[27]  Kurt Hornik,et al.  Misc Functions of the Department of Statistics (e1071), TU Wien , 2014 .

[28]  Chunhua Weng,et al.  Comparing the effectiveness of a clinical registry and a clinical data warehouse for supporting clinical trial recruitment: a case study. , 2010, AMIA ... Annual Symposium proceedings. AMIA Symposium.

[29]  David W. Embley,et al.  Formulating Queries for Assessing Clinical Trial Eligibility , 2006, NLDB.