Comparative Study of Using Word Co-occurrence to Extract Disease Symptoms from Web Documents

The research aim is a comparative study of using different word co-occurrence sizes as the two word co-occurrence and the N word co-occurrence on verb phrases to extract disease symptom explanations from downloaded hospital documents. The research results are applied to construct the semantic relations between disease-topic names and symptom explanations for enhancing the automatic problem-solving system. The machine learning technique, Support Vector Machine, and the similarity score determination are proposed to solve the boundary of simple sentences explaining the symptoms for the two word co-occurrence and the N word co-occurrence respectively. The symptom extraction result by the N word co-occurrence provides the higher precision than the two word co-occurrence from the documents.