Model-based reasoning methods for diagnosis in integrative medicine based on electronic medical records and natural language processing

Background: This study aimed to investigate model-based reasoning (MBR) algorithms for the diagnosis of integrative medicine based on electronic medical records (EMRs) and natural language processing. Methods: A total of 14,075 medical records of clinical cases were extracted from the EMRs as the development dataset, and an external test dataset consisting of 1,000 medical records of clinical cases was extracted from independent EMRs. MBR methods based on word embedding, machine learning, and deep learning algorithms were developed for the automatic diagnosis of syndrome pattern in integrative medicine. MBR algorithms combining rule-based reasoning (RBR) were also developed. A standard evaluation metrics consisting of accuracy, precision, recall, and F1 score were used for the performance estimation of the methods. The association analyses were conducted on the sample size, number of syndrome pattern type, and diagnosis of lung diseases with the best algorithms. Results: The Word2Vec CNN MBR algorithms showed high performance (accuracy of 0.9586 in the test dataset) in the syndrome pattern diagnosis. The Word2Vec CNN MBR combined with RBR also showed high performance (accuracy of 0.9229 in the test dataset). The diagnosis of lung diseases could enhance the performance of the Word2Vec CNN MBR algorithms. Each group sample size and syndrome pattern type affected the performance of these algorithms. Conclusion: The MBR methods based on Word2Vec and CNN showed high performance in the syndrome pattern diagnosis in integrative medicine in lung diseases. The parameters of each group sample size, syndrome pattern type, and diagnosis of lung diseases were associated with the performance of the methods.

[1]  J. M. Serra,et al.  Support vector machines for predictive modeling in heterogeneous catalysis: a comprehensive introduction and overfitting investigation based on two real applications. , 2006, Journal of combinatorial chemistry.

[2]  G. Tutz,et al.  An introduction to recursive partitioning: rationale, application, and characteristics of classification and regression trees, bagging, and random forests. , 2009, Psychological methods.

[3]  Ke-Ji Chen,et al.  Integrative medicine: the experience from China. , 2008, Journal of alternative and complementary medicine.

[4]  Andy Liaw,et al.  Extreme Gradient Boosting as a Method for Quantitative Structure-Activity Relationships , 2016, J. Chem. Inf. Model..

[5]  Michael Rowe,et al.  An Introduction to Machine Learning for Clinicians. , 2019, Academic medicine : journal of the Association of American Medical Colleges.

[6]  Stefano Panzeri,et al.  Python for Information Theoretic Analysis of Neural Data , 2009, Frontiers Neuroinformatics.

[7]  P. Araujo,et al.  Multilayer perceptron neural network for flow prediction. , 2011, Journal of environmental monitoring : JEM.

[8]  Jingcheng Dong,et al.  A Real-World Evidence Study for Distribution of Traditional Chinese Medicine Syndrome and Its Elements on Respiratory Disease , 2018, Evidence-based complementary and alternative medicine : eCAM.

[9]  Hongfang Liu,et al.  A Comparison of Word Embeddings for the Biomedical Natural Language Processing , 2018, J. Biomed. Informatics.

[10]  Fernando Maestú,et al.  Artificial neural network detects human uncertainty. , 2018, Chaos.

[11]  Xiaohui Yuan,et al.  Phenotype Extraction Based on Word Embedding to Sentence Embedding Cascaded Approach , 2018, IEEE Transactions on NanoBioscience.

[12]  Junping Wei,et al.  Analysis on traditional Chinese medicine syndrome elements and relevant factors for senile diabetes. , 2013, Journal of traditional Chinese medicine = Chung i tsa chih ying wen pan.

[13]  Lun-Chien Lo,et al.  Traditional Chinese Medicine for Metabolic Syndrome via TCM Pattern Differentiation: Tongue Diagnosis for Predictor , 2016, Evidence-based complementary and alternative medicine : eCAM.

[14]  Liang Chen,et al.  Using natural language processing to extract clinically useful information from Chinese electronic medical records , 2019, Int. J. Medical Informatics.

[15]  Lei Liu,et al.  Extracting important information from Chinese Operation Notes with natural language processing methods , 2014, J. Biomed. Informatics.

[16]  Yaoyun Zhang,et al.  A Study of Neural Word Embeddings for Named Entity Recognition in Clinical Text , 2015, AMIA.

[17]  Ting Hung Leung,et al.  Development of integrative medicine in Hong Kong, China , 2017, Chinese Journal of Integrative Medicine.

[18]  Fei Wang,et al.  Semantic relatedness and similarity of biomedical terms: examining the effects of recency, size, and section of biomedical publications on the performance of word2vec , 2017, BMC Medical Informatics and Decision Making.

[19]  Shaodian Zhang,et al.  Detection of medical text semantic similarity based on convolutional neural network , 2019, BMC Medical Informatics and Decision Making.

[20]  Bin Li,et al.  Effects of electronic medical record in a Chinese hospital: A time series study , 2012, Int. J. Medical Informatics.

[21]  Joanne Wai Yee Chung,et al.  Validation of a Novel Traditional Chinese Medicine Pulse Diagnostic Model Using an Artificial Neural Network , 2011, Evidence-based complementary and alternative medicine : eCAM.

[22]  M. Kubát An Introduction to Machine Learning , 2017, Springer International Publishing.

[23]  Inhwan Kim,et al.  Word2vec convolutional neural networks for classification of news articles and tweets , 2019, PloS one.

[24]  Ning Li,et al.  Development and validation of method for defining conditions using Chinese electronic medical record , 2016, BMC Medical Informatics and Decision Making.

[25]  Junping Wei,et al.  Analysis of TCM syndrome elements and relevant factors for senile diabetes , 2013 .

[26]  J. Duncan,et al.  Deep learning for liver tumor diagnosis part I: development of a convolutional neural network classifier for multi-phasic MRI , 2019, European Radiology.

[27]  Jihad S. Obeid,et al.  Word2Vec inversion and traditional text classifiers for phenotyping lupus , 2017, BMC Medical Informatics and Decision Making.