Text Classification-Based Automatic Recruitment of Patients for Clinical Trials: A Silver Standards-Based Case Study

A lack of recruitment of appropriate subjects plagues most clinical research trials. One barrier is an efficient way to identify eligible subjects. Researchers worked to harness computing power to improve automated identification of potential subjects for clinical trials with modest success. We use text classification to automatically identify patients for a hypothetical Acute Coronary Syndrome clinical research study from intensive care unit discharge summaries. We apply several state of the art classification methods including Bayesian Logistic Regression, AdaBoost, Support Vector Machines, and Random Forests to build models from administrative manually assigned ICD-9 codes. We then apply these models to discharge summaries labeled by a board certified cardiologist for patients eligible for the hypothetical research study. The best models perform with 0.95 area under the ROC curve for identifying eligible patients. This pilot study suggests that text-based classification holds promise for identification of potential clinical trial subjects. Our methods require further validation in studies involving multiple inclusion and exclusion criteria.

[1]  D. Mozaffarian,et al.  Heart disease and stroke statistics--2009 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. , 2009, Circulation.

[2]  D. Mozaffarian,et al.  Heart disease and stroke statistics--2009 update: a report from the American Heart Association Statistics Committee and Stroke Statistics Subcommittee. , 2009, Circulation.

[3]  D. Dilts,et al.  The Prevalence and Economic Impact of Low-Enrolling Clinical Studies at an Academic Medical Center , 2011, Academic medicine : journal of the Association of American Medical Colleges.

[4]  Laura Inés Furlong,et al.  Assessment of NER solutions against the first and second CALBC Silver Standard Corpus , 2011, Semantic Mining in Biomedicine.

[5]  Dietrich Rebholz-Schuhmann,et al.  Calbc Silver Standard Corpus , 2010, J. Bioinform. Comput. Biol..

[6]  R. Hornung,et al.  Effect of a clinical trial alert system on physician participation in trial recruitment. , 2005, Archives of internal medicine.

[7]  Hans-Ulrich Prokosch,et al.  Employing Computers for the Recruitment into Clinical Trials: A Comprehensive Systematic Review , 2014, Journal of medical Internet research.

[8]  Pedro M. Domingos A few useful things to know about machine learning , 2012, Commun. ACM.

[9]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[10]  E H Shortliffe,et al.  Computer-based screening of patients with HIV/AIDS for clinical-trial eligibility. , 1995, The Online journal of current clinical trials.

[11]  D. Hunninghake,et al.  Recruitment experience in clinical trials: literature summary and annotated bibliography. , 1987, Controlled clinical trials.

[12]  Chih-Jen Lin,et al.  A Practical Guide to Support Vector Classication , 2008 .

[13]  Yoav Freund,et al.  A Short Introduction to Boosting , 1999 .

[14]  Maarten L. Simoons,et al.  The third universal definition of myocardial infarction , 2013 .

[15]  Chunhua Weng,et al.  Case Report: Electronic Screening Improves Efficiency in Clinical Trial Recruitment , 2009, J. Am. Medical Informatics Assoc..

[16]  M. Saeed Multiparameter Intelligent Monitoring in Intensive Care II ( MIMIC-II ) : A public-access intensive care unit database , 2011 .

[17]  David Madigan,et al.  Large-Scale Bayesian Logistic Regression for Text Categorization , 2007, Technometrics.

[18]  N. Lytkin,et al.  A comprehensive assessment of methods for de-novo reverse-engineering of genome-scale regulatory networks. , 2011, Genomics.

[19]  T. H. Kyaw,et al.  Multiparameter Intelligent Monitoring in Intensive Care II: A public-access intensive care unit database* , 2011, Critical care medicine.

[20]  Christopher G. Chute,et al.  Prospective recruitment of patients with congestive heart failure using an ad-hoc binary classifier , 2005, J. Biomed. Informatics.

[21]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[22]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[23]  Frank D. Wood,et al.  Diagnosis code assignment: models and evaluation metrics , 2013, J. Am. Medical Informatics Assoc..

[24]  Lawrence D. Fu,et al.  A comprehensive empirical comparison of modern supervised classification and feature selection methods for text categorization , 2014, J. Assoc. Inf. Sci. Technol..

[25]  Atul J Butte,et al.  Computerized recruiting for clinical trials in real time. , 2003, Annals of emergency medicine.

[26]  Robert E. Schapire,et al.  A Brief Introduction to Boosting , 1999, IJCAI.

[27]  Thorsten Joachims,et al.  Learning to classify text using support vector machines - methods, theory and algorithms , 2002, The Kluwer international series in engineering and computer science.

[28]  S. Ruberg,et al.  Paradigm shifts in clinical trials enabled by information technology , 2001, Statistics in medicine.

[29]  T. Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1999, ECML.

[30]  C. Jack,et al.  Boosting power for clinical trials using classifiers based on multiple biomarkers , 2010, Neurobiology of Aging.

[31]  Judith W. Dexheimer,et al.  Automated clinical trial eligibility prescreening: increasing the efficiency of patient identification for clinical trials in the emergency department , 2014, J. Am. Medical Informatics Assoc..

[32]  Din J. Wasem,et al.  Mining of Massive Datasets , 2014 .

[33]  L. Marks,et al.  Using technology to address recruitment issues in the clinical trial process. , 2002, Trends in biotechnology.