Agricultural Knowledge Discovery from Semi-Structured Text

This research aims to develop automatic knowledge discovery system from semi-structured Thai text for supporting plant diagnosis. Plant disease diagnosis is very important for farmers to be able to cure infected plants before infections become more severe. Prior to diagnosis, farmers need to gain knowledge retrieved primarily from text, including unstructured and semi-structured document. As this knowledge is spread throughout the text, collecting the required knowledge in its entirety is time consuming. An alternative to the manual approach is the use of automatic knowledge discovery processes to acquire concise knowledge for plant disease diagnosis. Then the knowledge discovery process consists of at least two main steps: knowledge extraction and knowledge generalization. However, there are two major problems in this research. First is the knowledge extraction problem attributed to linguistics, which can be solved by NLP technique such as zero anaphora, ellipsis, etc. And second is the generalization problem due to obtaining general knowledge that is intrinsically uncertain and incomplete. To solve these problems we propose three combination techniques: First, a template-matching rule is used to extract the knowledge from the agricultural document on website. Second, a Monte Carlo simulation technique is applied to solve the incomplete knowledge of plant disease symptoms from the texts. And the third one is the use of the fuzzy concept to determine the weighted average of the generality of the symptom from each pathogen type or insect type. The results of knowledge generalization will then be evaluated by experts, and knowledge extraction will be evaluated in term of precision, and recall. It is important to note that this is being conducted in part of ongoing research. Keyword: knowledge extraction, knowledge generalization, knowledge discovery, fuzzy, Monte Carlo simulation, template matching rule 1 C. Pechsiri, A. Kongwan and A. Kawtrakul, The Specialty Research Unit of Natural Language Processing and Intelligent Information System Technology, Department of Computer Engineering, Kasetsart University, Bangkok, Thailand, chaveevan@vivaldi.cpe.ku.ac.th