A data mining approach to selecting herbs with similar efficacy: Targeted selection methods based on medical subject headings (MeSH).

ETHNO-PHARMACOLOGICAL RELEVANCE Natural products have long been the most important source of ingredients in the discovery of new drugs. Moreover, since the Nagoya Protocol, finding alternative herbs with similar efficacy in traditional medicine has become a very important issue. Although random selection is a common method of finding ethno-medicinal herbs of similar efficacy, it proved to be less effective; therefore, this paper proposes a novel targeted selection method using data mining approaches in the MEDLINE database in order to identify and select herbs with a similar degree of efficacy. MATERIALS AND METHODS From among sixteen categories of medical subject headings (MeSH) descriptors, three categories containing terms related to herbal compounds, efficacy, toxicity, and the metabolic process were selected. In order to select herbs of similar efficacy in a targeted way, we adopted the similarity measurement method based on MeSH. In order to evaluate the proposed algorithm, we built up three different validation datasets which contain lists of original herbs and corresponding medicinal herbs of similar efficacy. RESULTS The average area under curve (AUC) of the proposed algorithm was found to be about 500% larger than the random selection method. We found that the proposed algorithm puts more hits at the front of the top-10 list than the random selection method, and precisely discerns the efficacy of the herbs. It was also found that the AUC of the experiments either remained the same or increased slightly in all three validation datasets as the search range was increased. CONCLUSION This study reveals and proves that the proposed algorithm is significantly more accurate and efficient in finding alternative herbs of similar efficacy than the random selection method. As such, it is hoped that this approach will be used in diverse applications in the ethno-pharmacology field.

[1]  David W. Conrath,et al.  Semantic Similarity Based on Corpus Statistics and Lexical Taxonomy , 1997, ROCLING/IJCLCLP.

[2]  Wei Yuan,et al.  Enhanced clustering of biomedical documents using ensemble non-negative matrix factorization , 2011, Inf. Sci..

[3]  C. Appleton,et al.  A semi-quantitative approach to the selection of appropriate candidate plant molluscicides--a South African application. , 1997, Journal of ethnopharmacology.

[4]  Roberto Turrin,et al.  Performance of recommender algorithms on top-n recommendation tasks , 2010, RecSys '10.

[5]  Xiaohua Hu,et al.  A comprehensive comparison study of document clustering for a biomedical digital library MEDLINE , 2006, Proceedings of the 6th ACM/IEEE-CS Joint Conference on Digital Libraries (JCDL '06).

[6]  Alan L Harvey,et al.  Natural products in drug discovery. , 2008, Drug discovery today.

[7]  Xiaohua Hu,et al.  Integration of semantic-based bipartite graph representation and mutual refinement strategy for biomedical literature clustering , 2006, KDD '06.

[8]  T. Edwards,et al.  Regression analyses of southern African ethnomedicinal plants: informing the targeted selection of bioprospecting and pharmacological screening subjects. , 2008, Journal of ethnopharmacology.

[9]  Jia Zeng,et al.  Enhancing MEDLINE document clustering by incorporating MeSH semantic similarity , 2009, Bioinform..

[10]  J. Hanley,et al.  A method of comparing the areas under receiver operating characteristic curves derived from the same cases. , 1983, Radiology.

[11]  George Karypis,et al.  Item-based top-N recommendation algorithms , 2004, TOIS.

[12]  L. Wessjohann,et al.  Comparative metabolite profiling and fingerprinting of medicinal licorice roots using a multiplex approach of GC-MS, LC-MS and 1D NMR techniques. , 2012, Phytochemistry.

[13]  Miguel A. Andrade-Navarro,et al.  Ranking the whole MEDLINE database according to a large training set using text indexing , 2005, BMC Bioinformatics.

[14]  Min Qian,et al.  Replacements of Rare Herbs and Simplifications of Traditional Chinese Medicine Formulae Based on Attribute Similarities and Pathway Enrichment Analysis , 2013, Evidence-based complementary and alternative medicine : eCAM.

[15]  P. Cox Ethnopharmacology and the search for new drugs. , 1990, Ciba Foundation symposium.

[16]  J. Duan,et al.  Comparison of three officinal Chinese pharmacopoeia species of Glycyrrhiza based on separation and quantification of triterpene saponins and chemometrics analysis. , 2013, Food chemistry.

[17]  Markus Zanker,et al.  Proceedings of the fourth ACM conference on Recommender systems , 2010, RecSys 2010.

[18]  Sang-Jun Yea,et al.  Picking out herbs with analogous efficacy based on MeSH semantic similarity , 2014, 2014 IEEE International Conference on Bioinformatics and Biomedicine (BIBM).