Case Representation and Retrieval Techniques for Neuroanatomical Connectivity Extraction from PubMed

PubMed is a comprehensive database of abstracts and references of a large number of publications in the biomedical domain. Curation of structured connectivity databases creates an easy access point to the wealth of neuroanatomical connectivity information reported in the literature over years. Manual curation of such databases is time consuming and labor intensive. We present a Case Based Reasoning (CBR) approach to automatically compile connectivity status between brain region mentions in text. We focus on the Case Retrieval part of the CBR cycle and present three Instance based learning techniques to retrieve similar cases from the case base. These techniques use varied case representations ranging from surface level features to richer syntax based features. We have experimented with diverse similarity measures and feature weighting schemes for each technique. The three techniques have been evaluated and compared using a benchmark dataset from PubMed and it was found that the one using deep syntactic features gives the best trade off between Precision and Recall. In this study, we have explored issues pertaining to representation of, and retrieval over textual cases. It is envisaged that the ideas presented in the paper can be adapted to needs of other textual CBR domains as well.