Multi-Word-Co-occurrence collection from texts for health-problem diagnosis

This research aims to collect multi-word co-occurrences with health-problem/symptom concepts for health-problem diagnosis from wed-board documents. The result of this research is a benefit for assisting the ordinary people in preliminary diagnosis health problems. The multi-Word-Co of the research is based on an event expression by a verb phrase. However, the research contains two main problems; the first problem is how to identify multi-word co-occurrence including the multi-word co-occurrence boundary with the symptom concept after the stop word removal. The second one is the ambiguous multi-word co-occurrence concept. Therefore, the machine learning with Naïve Bayes is applied to solve the consequent words of the verb phrase (after the stop word elimination) as the multi-word co-occurrence with the symptom concept. The results of this research can provide the high precision of the symptom concept determination through multiword co-occurrences on documents.