Data discretization for novel resource discovery in large medical data sets

This paper is motivated by the problems of dealing with large data sets in information retrieval. The authors suggest an information retrieval framework based on mathematical principles to organize and permit end-user manipulation of a retrieval set. By adjusting through the interface the weights and types of relationships between query and set members, it is possible to expose unanticipated, novel relationships between the query/document pair. The retrieval set as a whole is parsed into discrete concept-oriented subsets (based on within-set similarity measures) and displayed on screen as interactive "graphic nodes" in an information space, distributed at first based on the vector model (similarity measure of set to query). The result is a visualized map wherein it is possible to identify main concept regions and multiple sub-regions as dimensions of the same data. Users may examine the membership within sub-regions. Based on this framework, a data visualization user interface was designed to encourage users to work with the data on multiple levels to find novel relationships between the query and retrieval set members. Space constraints prohibit addressing all aspects of this project.