Text Classification of Cancer Clinical Trials Documents Using Deep Neural Network and Fine Grained Document Clustering

Clinical trials are any research studies involve human participation with health safety outcomes. In clinical trials, there is the most important term called the eligibility criteria (eligible and not eligible). The eligibility criteria for clinical trials are usually written in free text, it requires interpretation from a computer to process them. The purpose of this paper is to classify cancer clinical texts from the public dataset at https://clinicaltrials.gov. The proposed algorithm is Supervised Learning such as K-Nearest Neighbor and Decision Tree, Machine Learning such as Support Vector Machine and Random Forest, Deep Neural Network such as Multilayer Perceptron, and Fine Grained Document Clustering. This research has contributed a new classification model for clinical trial documents and computational value or speed improvement. The results shown the highest accuracy at random forest method 90.5% and the lowest accuracy at multilayer perceptron method that is 72.1%

[1]  Annisa Darmawahyuni,et al.  An Automated ECG Beat Classification System Using Deep Neural Networks with an Unsupervised Feature Extraction Technique , 2019, Applied Sciences.

[2]  Annisa Darmawahyuni,et al.  Coronary Heart Disease Interpretation Based on Deep Neural Network , 2019, Computer Engineering and Applications Journal.

[3]  Ching-Hua Chuan,et al.  Classifying Eligibility Criteria in Clinical Trials Using Active Deep Learning , 2018, 2018 17th IEEE International Conference on Machine Learning and Applications (ICMLA).

[4]  Reza Firsandaya Malik,et al.  Breast Cancer Classification Using Deep Learning , 2018, 2018 International Conference on Electrical Engineering and Computer Science (ICECOS).

[5]  Zhi-qiang Wei,et al.  Research of Text Classification Based on Improved TF-IDF Algorithm , 2018, 2018 IEEE International Conference of Intelligent Robotic and Control Engineering (IRCE).

[6]  Marco Spruit,et al.  Comparing Deep Learning and Classical Machine Learning Approaches for Predicting Inpatient Violence Incidents from Clinical Text , 2018, Applied Sciences.

[7]  Richi Nayak,et al.  Fine-grained document clustering via ranking and its application to social media analytics , 2018, Social Network Analysis and Mining.

[8]  Antonio Pertusa,et al.  Learning Eligibility in Cancer Clinical Trials using Deep Neural Networks , 2018, Applied Sciences.

[9]  S. Nurmaini,et al.  Cardiac Arrhythmias Classification Using Deep Neural Networks and Principle Component Analysis Algorithm , 2018 .

[10]  Dina Demner-Fushman,et al.  Automated classification of eligibility criteria in clinical trials to facilitate patient-trial matching for specific patient populations , 2017, J. Am. Medical Informatics Assoc..

[11]  Theodora A. Varvarigou,et al.  A novel semantic representation for eligibility criteria in clinical trials , 2017, J. Biomed. Informatics.

[12]  Xavier Giró-i-Nieto,et al.  From pixels to sentiment: Fine-tuning CNNs for visual sentiment prediction , 2016, Image Vis. Comput..

[13]  James J. Cimino,et al.  Classification of Clinical Research Study Eligibility Criteria to Support Multi-Stage Cohort Identification Using Clinical Data Repositories , 2017, MedInfo.

[14]  Bonnie MacKellar,et al.  Patterns for conflict identification in clinical trial eligibility criteria , 2016, 2016 IEEE 18th International Conference on e-Health Networking, Applications and Services (Healthcom).

[15]  Ayman M. Eldeib,et al.  Breast cancer classification using deep belief networks , 2016, Expert Syst. Appl..

[16]  Eric Fosler-Lussier,et al.  Textual inference for eligibility criteria resolution in clinical trials , 2015, J. Biomed. Informatics.

[17]  Bonnie MacKellar,et al.  Analyzing conflicts between Clinical Trials from a patient perspective , 2015, 2015 17th International Conference on E-health Networking, Application & Services (HealthCom).

[18]  Jun Zhao,et al.  Distant Supervision for Relation Extraction via Piecewise Convolutional Neural Networks , 2015, EMNLP.

[19]  Louise Deléger,et al.  Increasing the efficiency of trial-patient matching: automated clinical trial eligibility Pre-screening for pediatric oncology patients , 2015, BMC Medical Informatics and Decision Making.

[20]  Christopher D. Manning,et al.  Improved Semantic Representations From Tree-Structured Long Short-Term Memory Networks , 2015, ACL.

[21]  Krzysztof Marasek,et al.  Deep Belief Neural Networks and Bidirectional Long-Short Term Memory Hybrid for Speech Recognition , 2015 .

[22]  Jürgen Schmidhuber,et al.  Deep learning in neural networks: An overview , 2014, Neural Networks.

[23]  J.G.R. Sathiaseelan,et al.  An Advanced Multi Class Instance Selection based Support Vector Machine for Text Classification , 2015 .

[24]  Naomi S. Bardach,et al.  N-gram support vector machines for scalable procedure and diagnosis classi fi cation, with applications to clinical free text data from the intensive care unit , 2014 .

[25]  Cynthia Brandt,et al.  Semi-supervised clinical text classification with Laplacian SVMs: An application to cancer case management , 2013, J. Biomed. Informatics.

[26]  Christopher Potts,et al.  Recursive Deep Models for Semantic Compositionality Over a Sentiment Treebank , 2013, EMNLP.

[27]  Eduardo P. Wiechmann,et al.  Active learning for clinical text classification: is it better than random sampling? , 2012, J. Am. Medical Informatics Assoc..

[28]  Philip S. Yu,et al.  Review spam detection via temporal pattern discovery , 2012, KDD.

[29]  Benno Stein,et al.  The optimum clustering framework: implementing the cluster hypothesis , 2011, Information Retrieval.

[30]  Murat Can Ganiz,et al.  Analysis of preprocessing methods on classification of Turkish texts , 2011, 2011 International Symposium on Innovations in Intelligent Systems and Applications.

[31]  Mor Peleg,et al.  A practical method for transforming free-text eligibility criteria into computable criteria , 2011, J. Biomed. Informatics.

[32]  Alex Hai Wang,et al.  Don't follow me: Spam detection in Twitter , 2010, 2010 International Conference on Security and Cryptography (SECRYPT).

[33]  Dino Isa,et al.  Text Document Preprocessing with the Bayes Formula for Classification Using the Support Vector Machine , 2008, IEEE Transactions on Knowledge and Data Engineering.

[34]  A. Chilingarian,et al.  Implementation of the Random Forest method for the Imaging Atmospheric Cherenkov Telescope MAGIC , 2007, 0709.3719.

[35]  Leo Breiman,et al.  Random Forests , 2001, Machine Learning.

[36]  C. Chatterjee,et al.  Statistical risk analysis for classification and feature extraction by multilayer perceptrons , 1996, Proceedings of International Conference on Neural Networks (ICNN'96).