Selection of Passages for Information Reduction

There currently exists a bottleneck in extracting information from pre-existing texts to generate a symbolic representation of the text that can be used by a case-based reasoning (CBR) system. Symbolic case representations are used in legal and medical domains among others. Finding similar cases in the legal domain is crucial because of the importance precedents play when arguing a case. Further, by examining the features and decisions of previous cases, an advocate or judge can decide how to handle a current problem. In the medical domain, remembering or finding cases similar to the current patient’s may be key to making a correct diagnosis: they may provide insight as to how an illness should be treated or which treatments may prove to be the most effective. This thesis demonstrates methods of locating, automatically and quickly, those textual passages that relate to predefined important features contained in previously unseen texts. The important features are those defined for use by a CBR system as slots and fillers and constitute the fratnebased representation of a text or case. Broadly, we use a set of textual “annotations” associated with each slot to generate an information retrieval (IR) query. Each query is aimed at locating the set of passages most likely to contain information about the slot under consideration. Currently, a user must read through many pages of text in order to find fillers for all the slots in a case-frame. This is a huge manual undertaking, particularly when there are fifty or more texts. Unfortunately, full-text understanding is not yet feasible as an alternative and information extract techniques themselves rely on large numbers of training texts with manually encoded answer keys. By locating and presenting relevant passages to the user, we will have significantly reduced the time and effort expenditure. Alternatively, we could save an automated information extraction system from processing an entire text by focusing the system on those portions of the text most IikeIy to contain the desired information. This work integrates a case-based reasoner with an