Knowledge discovery in traditional Chinese medicine: State of the art and perspectives

OBJECTIVE As a complementary medical system to Western medicine, traditional Chinese medicine (TCM) provides a unique theoretical and practical approach to the treatment of diseases over thousands of years. Confronted with the increasing popularity of TCM and the huge volume of TCM data, historically accumulated and recently obtained, there is an urgent need to explore these resources effectively by the techniques of knowledge discovery in database (KDD). This paper aims at providing an overview of recent KDD studies in TCM field. METHODS A literature search was conducted in both English and Chinese publications, and major studies of knowledge discovery in TCM (KDTCM) reported in these materials were identified. Based on an introduction to the state of the art of TCM data resources, a review of four subfields of KDTCM research was presented, including KDD for the research of Chinese medical formula, KDD for the research of Chinese herbal medicine, KDD for TCM syndrome research, and KDD for TCM clinical diagnosis. Furthermore, the current state and main problems in each subfield were summarized based on a discussion of existing studies, and future directions for each subfield were also proposed accordingly. RESULTS A series of KDD methods are used in existing KDTCM researches, ranging from conventional frequent itemset mining to state of the art latent structure model. Considerable interesting discoveries are obtained by these methods, such as novel TCM paired drugs discovered by frequent itemset analysis, functional community of related genes discovered under syndrome perspective by text mining, the high proportion of toxic plants in the botanical family Ranunculaceae disclosed by statistical analysis, the association between M-cholinoceptor blocking drug and Solanaceae revealed by association rule mining, etc. It is particularly inspiring to see some studies connecting TCM with biomedicine, which provide a novel top-down view for functional genomics research. However, further developments of KDD methods are still expected to better adapt to the features of TCM. CONCLUSIONS Existing studies demonstrate that KDTCM is effective in obtaining medical discoveries. However, much more work needs to be done in order to discover real diamonds from TCM domain. The usage and development of KDTCM in the future will substantially contribute to the TCM community, as well as modern life science.

[1]  Zhou Lu,et al.  Fuzzy clustering analysis of Chinese herbs for relieving exterior syndrome , 2004 .

[2]  Z Z Deng,et al.  [Comparison between two diagnostic methods of computer's mathematic model and clinical diagnosis on TCM syndromes of rheumatoid arthritis]. , 1996, Zhongguo Zhong xi yi jie he za zhi Zhongguo Zhongxiyi jiehe zazhi = Chinese journal of integrated traditional and Western medicine.

[3]  Ping Liu,et al.  [Combined use of factor analysis and cluster analysis in classification of traditional Chinese medical syndromes in patients with posthepatitic cirrhosis]. , 2005, Zhong xi yi jie he xue bao = Journal of Chinese integrative medicine.

[4]  Cheng Xiao,et al.  Correlation between CD4, CD8 cell infiltration in gastric mucosa, Helicobacter pylori infection and symptoms in patients with chronic gastritis. , 2005, World journal of gastroenterology.

[5]  A. Lu,et al.  Anti-Helicobacter pylori immunoglobulin G (IgG) and IgA antibody responses and the value of clinical presentations in diagnosis of H. pylori infection in patients with precancerous lesions. , 2003, World journal of gastroenterology.

[6]  Feng Xue,et al.  Data Mining in Establishing Fingerprint Spectrum of Chinese Traditional Medicines , 2002 .

[7]  David Zhang,et al.  Computerized tongue diagnosis based on Bayesian networks , 2004, IEEE Transactions on Biomedical Engineering.

[8]  Weiyu Fan The traditional chinese medical literature analysis and retrieval system (TCMLARS) and its application , 2001 .

[9]  Changjie Tang,et al.  NNF: An Effective Approach in Medicine Paring Analysis of Traditional Chinese Medicine Prescriptions , 2005, DASFAA.

[10]  Nada Lavrac,et al.  Selected techniques for data mining in medicine , 1999, Artif. Intell. Medicine.

[11]  Changjie Tang,et al.  TCMiner: A High Performance Data Mining System for Multi-dimensional Data Analysis of Traditional Chinese Medicine Prescriptions , 2004, ER.

[12]  Aining Yin,et al.  Ontology development for unified traditional Chinese medical language system , 2004, Artif. Intell. Medicine.

[13]  Wang Jiaxin,et al.  Classifier for chinese traditional medicine with high-dimensional and small sample-size data , 2002, Proceedings of the 4th World Congress on Intelligent Control and Automation (Cat. No.02EX527).

[14]  R. Kessler,et al.  Unconventional medicine in the United States. Prevalence, costs, and patterns of use. , 1993, The New England journal of medicine.

[15]  Ka Wai Fan,et al.  Online research databases and journals of Chinese medicine. , 2004, Journal of alternative and complementary medicine.

[16]  Zhaohui Wu,et al.  Mining Both Associated and Correlated Patterns , 2006, International Conference on Computational Science.

[17]  Xiang Zheng,et al.  A 3-Stage Voting Algorithm for Mining Optimal Ingredient Pattern of Traditional Chinese Medicine , 2003 .

[18]  K.J. Cios,et al.  From the guest editor medical data mining and knowledge discovery , 2000, IEEE Engineering in Medicine and Biology Magazine.

[19]  Padhraic Smyth,et al.  From Data Mining to Knowledge Discovery in Databases , 1996, AI Mag..

[20]  Ping Liu,et al.  A self-learning expert system for diagnosis in traditional Chinese medicine , 2004, Expert Syst. Appl..

[21]  Huajun Chen,et al.  TCM-Grid: Weaving a Medical Grid for Traditional Chinese Medicine , 2003, International Conference on Computational Science.

[22]  Zhou Jiaju,et al.  Traditional Chinese Medicines: Molecular Structures, Natural Sources, and Applications , 1999 .

[23]  Qi Jun,et al.  Studies on the amount of trace elements and efficacy in Chinese medicinal herbs for treating exterior syndromes , 2003 .

[24]  T. Jenssen,et al.  A literature network of human genes for high-throughput analysis of gene expression , 2001, Nature Genetics.

[25]  Jean-François Boulicaut,et al.  Proceedings of the 8th European Conference on Principles and Practice of Knowledge Discovery in Databases , 2004 .

[26]  Baoyan Liu,et al.  Text Mining for Clinical Chinese Herbal Medical Knowledge Discovery , 2005, Discovery Science.

[27]  John M Colford,et al.  Chinese herbal medicine and interferon in the treatment of chronic hepatitis B: a meta-analysis of randomized, controlled trials. , 2002, American journal of public health.

[28]  Shan Gao,et al.  Structured Priors for Structure Learning , 2006, UAI.

[29]  Nevin L. Zhang,et al.  Latent Structure Models and Diagnosis in Traditional Chinese Medicine (I) , 2006 .

[30]  Petra Perner,et al.  Data Mining - Concepts and Techniques , 2002, Künstliche Intell..

[31]  Yanhuai Liu,et al.  China traditional Chinese Medicine (TCM) Patent Database , 2004 .

[32]  Cungen Cao,et al.  Knowledge modeling and acquisition of traditional Chinese herbal drugs and formulae from text , 2004, Artif. Intell. Medicine.

[33]  R. Kessler,et al.  Trends in alternative medicine use in the United States, 1990-1997: results of a follow-up national survey. , 1998, JAMA.

[34]  Tao Chen,et al.  Latent tree models and diagnosis in traditional Chinese medicine , 2008, Artif. Intell. Medicine.

[35]  Junli Chen,et al.  Text Mining for Finding Functional Community of Related Genes Using TCM Knowledge , 2004, PKDD.

[36]  Gregory M. Provan,et al.  The Sensitivity of Belief Networks to Imprecise Probabilities: An Experimental Investigation , 1996, Artif. Intell..

[37]  John F. Roddick,et al.  Exploratory medical knowledge discovery: experiences and issues , 2003, SKDD.

[38]  Nevin Lianwen Zhang,et al.  Hierarchical latent class models for cluster analysis , 2002, J. Mach. Learn. Res..

[39]  Zeng Ling-ming,et al.  Mining Compatibility of Traditional Chinese Medicine Based on Bitmap Matrix and Bi-support , 2005 .

[40]  J. Nicholl,et al.  Use and expenditure on complementary medicine in England: a population based survey. , 2001, Complementary therapies in medicine.

[41]  J. Jacobson,et al.  Use of complementary and alternative medicine among United States adults: the influences of personality, coping strategies, and social support. , 2005, Preventive medicine.

[42]  Russ B. Altman,et al.  AI in Medicine: The Spectrum of Challenges from Managed Care to Molecular Medicine , 1999, AI Mag..

[43]  A. Aderem Systems Biology: Its Practice and Challenges , 2005, Cell.

[44]  Jiang Yong-guang,et al.  Analysis of correlation based on bidirectional association rules , 2005 .

[45]  Yu Ym,et al.  [Comparison between two diagnostic methods of computer's mathematic model and clinical diagnosis on TCM syndromes of rheumatoid arthritis]. , 1996 .

[46]  Tcm Colleg,et al.  Analysis of the Properties of 101 Blood Pressure-reducing Plants , 2005 .

[47]  Krzysztof J. Cios,et al.  Uniqueness of medical data mining , 2002, Artif. Intell. Medicine.

[48]  Lu Ai,et al.  Mining association rule in traditional Chinese medicine chemical database , 2005 .

[49]  Igor Kononenko,et al.  Machine learning for medical diagnosis: history, state of the art and perspective , 2001, Artif. Intell. Medicine.

[50]  K. Cios Medical data mining and knowledge discovery. , 2000, IEEE engineering in medicine and biology magazine : the quarterly magazine of the Engineering in Medicine & Biology Society.

[51]  Xiao-Qiang Yue,et al.  [Analysis of studies on pattern recognition of tongue image in traditional Chinese medicine by computer technology]. , 2004, Zhong xi yi jie he xue bao = Journal of Chinese integrative medicine.

[52]  Thomas D. Nielsen,et al.  Latent variable discovery in classification models , 2004, Artif. Intell. Medicine.

[53]  B Zupan,et al.  Data mining techniques and applications in medicine. , 1999, Artificial intelligence in medicine.

[54]  Zhaohui Wu,et al.  TCMMDB: a distributed multidatabase query system and its key technique implemention , 2001, 2001 IEEE International Conference on Systems, Man and Cybernetics. e-Systems and e-Man for Cybernetics in Cyberspace (Cat.No.01CH37236).

[55]  H. Yamashita,et al.  Popularity of complementary and alternative medicine in Japan: a telephone survey. , 2002, Complementary therapies in medicine.

[56]  Tomas Kocka,et al.  Efficient learning of hierarchical latent class models , 2004, 16th IEEE International Conference on Tools with Artificial Intelligence.

[57]  Peter A. Bath,et al.  Data mining in health and medical information , 2005, Annu. Rev. Inf. Sci. Technol..