Automatic Text Classification of ICD-10 Related CoD from Complex and Free Text Forensic Autopsy Reports

Forensic autopsy focuses on revealing the cause of death (CoD) by examination of a dead body. In this research study, various feature extraction schemes, feature value representation schemes and text classification algorithms have been applied on forensic autopsy reports to discover the suitable feature extraction approach, feature value representation approach and text classification approach. From experimental results, it was found that the unigram features outperformed bigram, trigram and hybrids of unigram, bigram and trigram features. Moreover, TF and TFiDF feature value representation schemes were proven more suitable than binary representation and normalized TFiDF schemes. Finally, SVM decision models outperformed RF and NB.

[1]  Kasturi Dewi Varathan,et al.  Using online social networks to track a pandemic: A systematic review , 2016, J. Biomed. Informatics.

[2]  Tariq Mahmood,et al.  Dynamic personalization in conversational recommender systems , 2014, Inf. Syst. E Bus. Manag..

[3]  Tariq Mahmood,et al.  Adaptive Automated Teller Machines — Part I , 2011, 2011 International Conference on Information and Communication Technologies.

[4]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[5]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[6]  Thorsten Joachims,et al.  Text Categorization with Support Vector Machines: Learning with Many Relevant Features , 1998, ECML.

[7]  MEDICAL certification of cause of death; instructions for physicians on use of international form of medical certificate of cause of death. , 1952, The Journal of the Egyptian Medical Association.

[8]  David D. Lewis,et al.  Feature Selection and Feature Extraction for Text Categorization , 1992, HLT.

[9]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[10]  Tariq Mahmood,et al.  Adaptive Automated Teller Machines — Part II , 2011, 2011 International Conference on Information and Communication Technologies.

[11]  Samuel Danso,et al.  Linguistic and Statistically Derived Features for Cause of Death Prediction from Verbal Autopsy Text , 2013, GSCL.

[12]  Ram Gopal Raj,et al.  An application of case-based reasoning with machine learning for forensic autopsy , 2014, Expert Syst. Appl..

[13]  Guy Lapalme,et al.  A systematic analysis of performance measures for classification tasks , 2009, Inf. Process. Manag..

[14]  Yunming Ye,et al.  An Improved Random Forest Classifier for Text Categorization , 2012, J. Comput..

[15]  Ron Kohavi,et al.  A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection , 1995, IJCAI.

[16]  Tariq Mahmood,et al.  Adaptive Automated Teller Machines , 2013, Expert Syst. Appl..

[17]  Chew Lim Tan,et al.  A comprehensive comparative study on term weighting schemes for text categorization with support vector machines , 2005, WWW '05.

[18]  Ewan Klein,et al.  Natural Language Processing with Python , 2009 .

[19]  David D. Lewis,et al.  Naive (Bayes) at Forty: The Independence Assumption in Information Retrieval , 1998, ECML.

[20]  Samuel Danso,et al.  A Comparative Study of Machine Learning Methods for Verbal Autopsy Text Classification , 2014, ArXiv.

[21]  Ghulam Mujtaba,et al.  A Holistic Approach to Software Defect Analysis and Management , 2011 .