Extracting a Sequence of Cause-Effect Concept Pairs from Texts

This research aims to extract event-pairs, particularly cause-effect concept pairs from disease documents downloaded from hospital-web-boards. Where the event pairs express in a sequence of cause-effect concept pairs lined up to produce a causal chain of certain diseases. This causal chain containing its root-cause benefits for solving system. Each causative/effect event concept is expressed by a verb phrase of an elementary discourse unit (EDU) which is a simple sentence. The research has three problems; how to determine each adjacent-EDU pair having the cause-effect relation, how to determine the causal-chain expression on the document having adjacent-EDU pairs with the cause-effect relation blended with non-cause-effect relation, and how to simply collect and represent the sequence of cause-effect pairs. Therefore, we extract an nWordCo concept set having the causative/effect concepts from EDUs' verb phrases including a Bayesian network to solve each nWordCo size. We apply the Naïve Bayes classifier to learn and extract a cause-effect-relation template of nWordCo concept pairs from the documents. We then propose using this template to extract the causal chain. We apply the ArrayList data structure to collect and represent the sequence of cause-effect pairs. The research results provide the high-percent correctness of the extracted causal-chain from the documents.