Medical speciality classification system based on binary particle swarms and ensemble of one vs. rest support vector machines

Nowadays, artificial intelligence plays an integral role in medical and healthcare informatics. Developing an automatic question classification and answering system is essential for coping with constant advancements in science and technology. However, efficient online medical services are required to promote offline medical services. This article proposes a system that automatically classifies medical questions of patients into medical specialties and supports the Arabic language in the MENA region. Text classification is not trivial, especially when dealing with a highly morphologically complex language, the dialectical form of which is the dominant form on the Internet. This work utilizes 15,000 medical questions asked by the clients of Altibbi telemedicine company. The questions are classified into 15 medical specialties. As the number of medical questions received daily by the company has increased, a need has arisen for an automatic classification system that can save the medical personnel much time and effort. Therefore, this article presents an efficient medical speciality classification system based on swarm intelligence (SI) and an ensemble of support vector machines (SVMs). Particle swarm optimization (PSO) is an SI-based and stochastic metaheuristic algorithm that is adopted to search for the optimal number of features and tune the hyperparameters of the SVM classifiers, which are deployed as one-versus-rest for multi-class classification. In addition, PSO is integrated with various binarization techniques to boost its performance. The experimental results show that the proposed approach accomplished remarkable performance as it achieved an accuracy of 85% and a features reduction rate of 95.9%.

[1]  Russell C. Eberhart,et al.  A discrete binary version of the particle swarm algorithm , 1997, 1997 IEEE International Conference on Systems, Man, and Cybernetics. Computational Cybernetics and Simulation.

[2]  Andrew McCallum,et al.  A comparison of event models for naive bayes text classification , 1998, AAAI 1998.

[3]  R. Marler,et al.  The weighted sum method for multi-objective optimization: new insights , 2010 .

[4]  Nizar Habash,et al.  A Survey of Opinion Mining in Arabic , 2019, ACM Trans. Asian Low Resour. Lang. Inf. Process..

[5]  Hossam Faris,et al.  Multi-objective Particle Swarm Optimization: Theory, Literature Review, and Application in Feature Selection for Medical Diagnosis , 2019, Algorithms for Intelligent Systems.

[6]  Steven Bird,et al.  NLTK: The Natural Language Toolkit , 2002, ACL.

[7]  Léon Bottou,et al.  Stochastic Gradient Descent Tricks , 2012, Neural Networks: Tricks of the Trade.

[8]  Waqar Mahmood,et al.  Multi-label biomedical question classification for lexical answer type prediction , 2019, J. Biomed. Informatics.

[9]  Geoffrey E. Hinton,et al.  Visualizing Data using t-SNE , 2008 .

[10]  Alaa Mohasseb,et al.  Question categorization and classification using grammar based approach , 2018, Inf. Process. Manag..

[11]  Ala I. Al-Fuqaha,et al.  A survey on particle swarm optimization with emphasis on engineering and network applications , 2019, Evolutionary Intelligence.

[12]  Anna García-Altés,et al.  Teleconsultations between Patients and Healthcare Professionals in Primary Care in Catalonia: The Evaluation of Text Classification Algorithms Using Supervised Machine Learning , 2020, International journal of environmental research and public health.

[13]  S. Sreenatha Reddy,et al.  A Review on Advanced Optimization Algorithms in Multidisciplinary Applications , 2020 .

[14]  Alaa Mohasseb,et al.  Improving Imbalanced Question Classification Using Structured Smote Based Approach , 2018, 2018 International Conference on Machine Learning and Cybernetics (ICMLC).

[15]  Dan Roth,et al.  Learning Question Classifiers , 2002, COLING.

[16]  Jianyi Guo,et al.  Question classification based on co-training style semi-supervised learning , 2010, Pattern Recognit. Lett..

[17]  Li Liu,et al.  Chinese Question Classification Based on Question Property Kernel , 2014, Int. J. Mach. Learn. Cybern..

[18]  Niladri Sekhar Dash,et al.  Categorization of Bangla Web Text Documents Based on TF-IDF-ICF Text Analysis Scheme , 2018 .

[19]  W. Bruce Croft,et al.  Analysis of Statistical Question Classification for Fact-Based Questions , 2005, Information Retrieval.

[20]  Bernhard E. Boser,et al.  A training algorithm for optimal margin classifiers , 1992, COLT '92.

[21]  Ryan M. Rifkin,et al.  In Defense of One-Vs-All Classification , 2004, J. Mach. Learn. Res..

[22]  Xuan-Hieu Phan,et al.  Using Dependency Analysis to Improve Question Classification , 2014, KSE.

[23]  Said Ouatik El Alaoui,et al.  SemBioNLQA: A semantic biomedical question answering system for retrieving exact and ideal answers to natural language questions , 2020, Artif. Intell. Medicine.

[24]  Said El Alaoui Ouatik,et al.  An arabic question classification method based on new taxonomy and continuous distributed representation of words , 2019 .

[25]  Hyun Uk Kim,et al.  Machine learning applications in systems metabolic engineering. , 2020, Current opinion in biotechnology.

[26]  Hossam Faris,et al.  Simultaneous Feature Selection and Support Vector Machine Optimization Using the Grasshopper Optimization Algorithm , 2018, Cognitive Computation.

[27]  Pilsung Kang,et al.  Multi-co-training for document classification using various document representations: TF-IDF, LDA, and Doc2Vec , 2019, Inf. Sci..

[28]  Hong Yu,et al.  AskHERMES: An online question answering system for complex clinical questions , 2011, J. Biomed. Informatics.

[29]  Børge Rokseth,et al.  Applications of machine learning methods for engineering risk assessment – A review , 2020, Safety Science.

[30]  Iman M. Almomani,et al.  Optimizing Extreme Learning Machines Using Chains of Salps for Efficient Android Ransomware Detection , 2020, Applied Sciences.

[31]  Sahibsingh A. Dudani The Distance-Weighted k-Nearest-Neighbor Rule , 1976, IEEE Transactions on Systems, Man, and Cybernetics.

[32]  Said Ouatik El Alaoui,et al.  A Machine Learning-based Method for Question Type Classification in Biomedical Question Answering , 2017, Methods of Information in Medicine.

[33]  Hossam Faris,et al.  A multi-verse optimizer approach for feature selection and optimizing SVM parameters based on a robust system architecture , 2017, Neural Computing and Applications.

[34]  Gunnar Rätsch,et al.  Soft Margins for AdaBoost , 2001, Machine Learning.

[35]  Russell C. Eberhart,et al.  A new optimizer using particle swarm theory , 1995, MHS'95. Proceedings of the Sixth International Symposium on Micro Machine and Human Science.

[36]  Zhaojun Yang,et al.  Machine Learning Solutions to Challenges in Finance: An Application to the Pricing of Financial Products , 2020 .

[37]  Asli Çelikyilmaz,et al.  Investigation of Question Classifier in Question Answering , 2009, EMNLP.

[38]  Andrew Lewis,et al.  S-shaped versus V-shaped transfer functions for binary Particle Swarm Optimization , 2013, Swarm Evol. Comput..

[39]  Andy Liaw,et al.  Classification and Regression by randomForest , 2007 .