Classification of medical documents according to diseases

Medical text classification is still one of the popular research problems inside text classification domain. Apart from some text data compiled from hospital records, most of the researchers in this field evaluate their classification methodologies on documents from MEDLINE database. When whole documents in the database are taken into consideration, MEDLINE is a multi-class and multi-label database. A dataset, containing a small subset of MEDLINE documents belonging to disease categories, is constructed in this study. It is a multi-class but single-label dataset. Due to the highly unbalanced distribution of this dataset, only documents belonging to top-10 disease categories are used in the experiments. The performances of three different pattern classifiers are analyzed on disease classification problem using this dataset. These three pattern classifiers are Bayesian network, C4.5 decision tree, and Random Forest trees. Experiments are realized for the two different cases where the stemming preprocessing step is applied or not. Experimental results show that the most successful classifier among three classifiers is Bayesian network classifier. Also, the best performance is obtained without applying stemming.

[1]  L. Breiman Random Forests--random Features , 1999 .

[2]  Christian Gütl,et al.  Multi-label Text Classification of German Language Medical Documents , 2007, MedInfo.

[3]  อนิรุธ สืบสิงห์,et al.  Data Mining Practical Machine Learning Tools and Techniques , 2014 .

[4]  Selma Ayse Ozel A Web page classification system based on a genetic algorithm using tagged-terms as features , 2011 .

[5]  Stan Matwin,et al.  Exploiting the systematic review protocol for classification of medical abstracts , 2011, Artif. Intell. Medicine.

[6]  Chris Buckley,et al.  OHSUMED: an interactive retrieval evaluation and new large test collection for research , 1994, SIGIR '94.

[7]  Cynthia Brandt,et al.  Semi-supervised clinical text classification with Laplacian SVMs: An application to cancer case management , 2013, J. Biomed. Informatics.

[8]  Damla Arifoglu,et al.  CodeMagic: Semi-Automatic Assignment of ICD-10-AM Codes to Patient Records , 2014, ISCIS.

[9]  Russ B. Altman,et al.  MScanner: a classifier for retrieving Medline citations , 2008, BMC Bioinformatics.

[10]  Selma Ayse Özel A Web page classification system based on a genetic algorithm using tagged-terms as features , 2011, Expert Syst. Appl..

[11]  Ngoc Thanh Nguyen,et al.  A combined negative selection algorithm-particle swarm optimization for an email spam detection system , 2015, Eng. Appl. Artif. Intell..

[12]  Wanda Pratt,et al.  The Effect of Feature Representation on MEDLINE Document Classification , 2005, AMIA.

[13]  Marek Reformat,et al.  Multilabel associative classification categorization of MEDLINE articles into MeSH keywords. , 2007, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[14]  Martin F. Porter,et al.  An algorithm for suffix stripping , 1997, Program.

[15]  《中华放射肿瘤学杂志》编辑部 Medline , 2001, Current Biology.

[16]  Xindong Wu,et al.  Authorship identification from unstructured texts , 2014, Knowl. Based Syst..

[17]  Ian H. Witten,et al.  The WEKA data mining software: an update , 2009, SKDD.

[18]  Alan F. Smeaton,et al.  Ontology-Based MEDLINE Document Classification , 2007, BIRD.

[19]  Jamshid Beheshti,et al.  A hidden Markov model-based text classification of medical documents , 2009, J. Inf. Sci..

[20]  Serkan Günal,et al.  Text classification using genetic algorithm oriented latent semantic features , 2014, Expert Syst. Appl..

[21]  M. Aono,et al.  Ontology based Approach for Classifying Biomedical Text Abstracts , 2011 .

[22]  Sébastien Fournier,et al.  Semantic Enrichments in Text Supervised Classification: Application to Medical Domain , 2014, FLAIRS Conference.

[23]  Serkan Günal,et al.  A novel probabilistic feature selection method for text classification , 2012, Knowl. Based Syst..

[24]  Serkan Günal,et al.  Detection of SMS spam messages on mobile phones , 2012, 2012 20th Signal Processing and Communications Applications Conference (SIU).