Analysis of Methods for Novel Case Selection

Successful information retrieval with case-based reasoning (CBR) depends on a good similarity metric that can distinguish those cases that are most pertinent to a user's query in a potentially large case-base. Traditional CBR ranks cases according to their similarity value and returns a small number of top-ranked cases in response to the query. Since in general the retrieved cases are very well matched to the input user query, there is a high probability that they are also strongly similar to each other, and hence the range of choice offered in the retrieved list is limited. As a response to this issue, some recent research has considered methods to increase the intra-dissimilarity or diversity of the retrieved set. In this paper, we study this problem in detail. We argue that the motivation of diversity strategies is to increase the probability of retrieving unusual or novel cases and introduce a methodology that allows their performance in terms of novel case retrieval to be evaluated. Moreover, we formulate the trade-off between diversity and matching quality as a binary optimisation problem, with an input control parameter allowing explicit tuning of this trade-off. We study solution strategies to the optimisation problems and demonstrate the importance of the control parameter in obtaining desired system performance. The methods discussed are evaluated on synthetic data, based on the publicly available Travel case-based data-set.