A question-answering system over Traditional Chinese Medicine

Traditional Chinese Medicine (TCM) has been around for thousands of years and it's a significant part of Chinese cultural heritage. The theoretical framework of TCM is unique and with rich of content, which contains the complex relationships between disease and medicine. Research on question-answering (QA) over TCM is significant for Chinese NLP and representative, because the resources of TCM are mostly Chinese-based. In this paper we present a QA system over TCM, which transforms user supplied questions into conjunctive query sentences (i.e. SQL) and retrieves the answer from both the built-up dataset and online encyclopedia. The contribution of this paper is threefold: Firstly, we introduce a novel approach for word segmentation over Chinese questions. We employ a TF-IDF model on the dataset to generate domain-specific dictionary with weight factor and tags, which are computed to select the best result of segmentation. Secondly, we present a novel method for constructing queries to retrieve answers. We compute the entity-attribute distance over a set of tagged words to construct incomplete ontology instances, which are used as the intermediary to generate queries. Lastly, we propose a method to integrate web data extraction with question answering, which allows us to extract answers from online encyclopedia website (i.e. Wikipedia). The results of our evaluation with 50 benchmark queries demonstrate the value of our approach.