A framework and its empirical study of automatic diagnosis of traditional Chinese medicine utilizing raw free-text clinical records

Automatic diagnosis is one of the most important parts in the expert system of traditional Chinese medicine (TCM), and in recent years, it has been studied widely. Most of the previous researches are based on well-structured datasets which are manually collected, structured and normalized by TCM experts. However, the obtained results of the former work could not be directly and effectively applied to clinical practice, because the raw free-text clinical records differ a lot from the well-structured datasets. They are unstructured and are denoted by TCM doctors without the support of authoritative editorial board in their routine diagnostic work. Therefore, in this paper, a novel framework of automatic diagnosis of TCM utilizing raw free-text clinical records for clinical practice is proposed and investigated for the first time. A series of appropriate methods are attempted to tackle several challenges in the framework, and the Naïve Bayes classifier and the Support Vector Machine classifier are employed for TCM automatic diagnosis. The framework is analyzed carefully. Its feasibility is validated through evaluating the performance of each module of the framework and its effectiveness is demonstrated based on the precision, recall and F-Measure of automatic diagnosis results.

[1]  Seth Kulick,et al.  Integrated Annotation for Biomedical Information Extraction , 2004, HLT-NAACL 2004.

[2]  Ping Liu,et al.  A self-learning expert system for diagnosis in traditional Chinese medicine , 2004, Expert Syst. Appl..

[3]  Julia A Scott,et al.  Use of complementary and alternative medicine in cancer patients: a European survey. , 2005, Annals of oncology : official journal of the European Society for Medical Oncology.

[4]  P. Barnes,et al.  Complementary and alternative medicine use among adults: United States, 2002. , 2004, Advance data.

[5]  Tao Li,et al.  A comparative study of feature selection and multiclass classification methods for tissue classification based on gene expression , 2004, Bioinform..

[6]  Tao Chen,et al.  Latent tree models and diagnosis in traditional Chinese medicine , 2008, Artif. Intell. Medicine.

[7]  Zhaohui Wu,et al.  Knowledge discovery in traditional Chinese medicine: State of the art and perspectives , 2006, Artif. Intell. Medicine.

[8]  Mu-Yen Chen,et al.  Integrated design of the intelligent web-based Chinese Medical Diagnostic System (CMDS) - Systematic development for digestive health , 2007, Expert Syst. Appl..

[9]  Changning Huang,et al.  Chinese Word Segmentation and Named Entity Recognition: A Pragmatic Approach , 2005, CL.

[10]  Yiming Yang,et al.  A Comparative Study on Feature Selection in Text Categorization , 1997, ICML.

[11]  Sanjoy Kumar Pal,et al.  Complementary and alternative medicine: An overview , 2002 .

[12]  Yonghong Peng,et al.  Text mining for traditional Chinese medical knowledge discovery: A survey , 2010, J. Biomed. Informatics.

[13]  Chih-Jen Lin,et al.  LIBSVM: A library for support vector machines , 2011, TIST.

[14]  Xia Chen,et al.  Automatic symptom name normalization in clinical records of traditional Chinese medicine , 2010, BMC Bioinformatics.

[15]  Dietlind L. Wahner-Roedler,et al.  Complementary and alternative medicine: use and disclosure in radiation oncology community practice , 2011, Supportive Care in Cancer.

[16]  M. Aly Survey on Multiclass Classification Methods , 2005 .

[17]  Jian Yin,et al.  A classification algorithm for TCM syndromes based on P-SVM , 2005, 2005 International Conference on Machine Learning and Cybernetics.

[18]  Jun'ichi Tsujii,et al.  Bidirectional Inference with the Easiest-First Strategy for Tagging Sequence Data , 2005, HLT.

[19]  Timothy A. Gonsalves,et al.  Feature Selection for Text Classification Based on Gini Coefficient of Inequality , 2010, FSDM.

[20]  Igor Kononenko,et al.  Inductive and Bayesian learning in medical diagnosis , 1993, Appl. Artif. Intell..

[21]  Sophia Ananiadou,et al.  Developing a Robust Part-of-Speech Tagger for Biomedical Text , 2005, Panhellenic Conference on Informatics.

[22]  Giovanni Maciocia,et al.  The Foundations of Chinese Medicine: A Comprehensive Text for Acupuncturists and Herbalists , 2005 .

[23]  Jian Zhang,et al.  On the use of words and n-grams for Chinese information retrieval , 2000, IRAL '00.

[24]  Siu Cheung Hui,et al.  Computational methods for Traditional Chinese Medicine: A survey , 2007, Comput. Methods Programs Biomed..

[25]  N. B. Venkateswarlu,et al.  A Critical Study of Selected Classification Algorithms for Liver Disease Diagnosis , 2011 .