Advanced Information Retrieval Using XML Standards

The bulk of clinical data is available in an electronic form. About 80% of the electronic data, however, is narrative text and therefore limited with respect to machine interpretation. As a result, the discussion has shifted from "electronic versus paper based data" towards "structured versus unstructured electronic data". The XML technology of today paves a way towards more structured clinical data and several XML based standards such as the Clinical Document Architecture (CDA) emerge. The implementation of XML based applications is yet a challenge. This paper will focus on XML retrieval issues and describe the difficulties and prospects of such an approach. The result of our work is a search technique called "topic matching" that exploits structured data in order to provide a search quality that is superior to established text matching methods. With this solution we are able to utilize large numbers of heterogeneously structured documents with only a minimum of effort.