The aim of this paper is to automatically extract the medicinal properties of an object, especially an herb, from technical documents as knowledge sources for health-care problem solving through the question-answering system, especially What-Question, for disease treatment. The extracted medicinal property knowledge is based on multiple simple sentence or EDUs (Elementary Discourse Units). There are three problems of extracting the medicinal property knowledge: the herbal object identification problem, the medicinal property identification problem for each object and the medicinal property boundary determination problem. We propose using NLP (Natural Language Processing) with statistical based approach to identify the medicinal property and also with machine learning technique as Naive Bayes with verb features for solving the boundary problem. The result shows successfully the medicinal property extraction of the precision and recall of 86% and 77%, respectively, along with 87% correctness of the boundary determination.
[1]
Yorick Wilks,et al.
Subject-Dependent Co-Occurence and Word Sense Disambiguation
,
1991,
ACL.
[2]
Marius Pasca,et al.
Turning Web Text and Search Queries into Factual Knowledge: Hierarchical Class Attribute Extraction
,
2008,
AAAI.
[3]
Hsin-Hsi Chen,et al.
TCMGeneDIT: a database for associated traditional Chinese medicine, gene and disease information using text mining
,
2008,
BMC complementary and alternative medicine.
[4]
Daniel Marcu,et al.
Building a Discourse-Tagged Corpus in the Framework of Rhetorical Structure Theory
,
2001,
SIGDIAL Workshop.