Classifying Eligibility Criteria in Clinical Trials Using Active Deep Learning

In this paper we propose an active deep learning approach to automatically classify eligibility criteria of clinical trials, an application that has not been explored in machine learning. We collected all clinical trial data from the National Cancer Institute website, and applied word2vec to learn word embeddings for eligibility criteria. Criteria encoded with word embeddings were then fed into a multi-layer convolution neural network (CNN) for classification. To overcome the challenge of non-existing class labels, we designed an active learning algorithm that uses uncertainty cluster sampling to navigate the dataset and strategically propagate obtained labels to expand the training set for CNN. Experimental results show that word2vec successfully learns meaningful embeddings in criteria data, and the active deep learning approach reports a significant lower error rate in classification than the baseline k-nearest neighbor method.

[1]  Burr Settles,et al.  Active Learning Literature Survey , 2009 .

[2]  Dina Utami,et al.  Improving Access to Online Health Information With Conversational Agents: A Randomized Controlled Experiment , 2016, Journal of medical Internet research.

[3]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[4]  S. Ramsey,et al.  Predicting Low Accrual in the National Cancer Institute's Cooperative Group Clinical Trials. , 2016, Journal of the National Cancer Institute.

[5]  Dorien Herremans,et al.  Modeling Musical Context with Word2vec , 2017, 1706.09088.

[6]  Ming Zhou,et al.  Learning Sentiment-Specific Word Embedding for Twitter Sentiment Classification , 2014, ACL.

[7]  P. Lakhani,et al.  Deep Learning at Chest Radiography: Automated Classification of Pulmonary Tuberculosis by Using Convolutional Neural Networks. , 2017, Radiology.

[8]  Matt J. Kusner,et al.  From Word Embeddings To Document Distances , 2015, ICML.

[9]  Razvan Pascanu,et al.  On the difficulty of training recurrent neural networks , 2012, ICML.

[10]  Harlan M Krumholz,et al.  Participation in cancer clinical trials: race-, sex-, and age-based disparities. , 2004, JAMA.

[11]  Xiaolong Wang,et al.  Active deep learning method for semi-supervised sentiment classification , 2013, Neurocomputing.

[12]  Jeffrey Dean,et al.  Distributed Representations of Words and Phrases and their Compositionality , 2013, NIPS.

[13]  M. Gilardi,et al.  Magnetic resonance imaging biomarkers for the early diagnosis of Alzheimer's disease: a machine learning approach , 2015, Front. Neurosci..

[14]  Yun Zhu,et al.  Support vector machines and Word2vec for text classification with semantic features , 2015, 2015 IEEE 14th International Conference on Cognitive Informatics & Cognitive Computing (ICCI*CC).

[15]  Robert Koprowski,et al.  Machine learning, medical diagnosis, and biomedical engineering research - commentary , 2014, Biomedical engineering online.

[16]  Nitish Srivastava,et al.  Dropout: a simple way to prevent neural networks from overfitting , 2014, J. Mach. Learn. Res..

[17]  Peng Liu,et al.  Ieee Journal of Selected Topics in Applied Earth Observations and Remote Sensing 1 Active Deep Learning for Classification of Hyperspectral Images , 2022 .