Constructing Consumer-Oriented Medical Terminology from the Web A Supervised Classifier Ensemble Approach

Increasingly, people turn to the Web for health related questions and even medical advice. Despite this rising demand, it remains non-trivial to access reliable consumer-oriented medical information for self-diagnosis, especially when presenting with multiple symptoms. In this project, we apply information extraction techniques to build a relational graph database of clinical entities, visualising and retrieving the relationships between symptoms and conditions. Since there are no readily available taxonomies on consumer-oriented medical terminology, accuracy of the classification is of paramount importance. To ensure medical terms on the Internet can be reliably classified into proper semantic categories, we develop a method to identify best performed classifiers across multiple feature sets, and assess the effectiveness of combining these features using ensemble learning techniques. Experiment results confirm that the classifier ensemble, when intelligently configured, can provide significant increases in performance. An interactive web-based graph interface and a mobile app are developed to demonstrate the potential use of this consumer-oriented terminology.