Interactive query formulation and feedback experiments in information retrieval

The effective use of information retrieval systems by end-users has been limited by their lack of knowledge on the particular organization of the databases searched and by their limited experience on how to formulate and modify search statements. This thesis explores and evaluates two mechanisms to improve retrieval performance by end-users. The first mechanism complements the formulation of a query by allowing users to interactively add term phrases. These phrases are generated either from the query text or from known relevant documents. This addition of term phrases to a query is suggested by the term discrimination model as a precision enhancement device. An interactive front-end for the SMART information retrieval system was developed to perform the interactive experiments needed to evaluate different phrase addition strategies. The second aspect of retrieval improvement studied is the evaluation of two database organizations that can be used to obtain new relevant documents by looking in the neighborhood of known relevant documents, browsing. Browsing in cluster hierarchies and nearest-neighbor networks is compared to relevance feedback in non-interactive experiments. The results obtained for the phrase addition methodology showed that simple non-interactive addition of phrases can perform as well as interactive addition. Even an optimal selection of the phrases based on the relevant documents not yet retrieved, did not significantly improve performance over simply adding all the phrases generated. Many useful phrases are not selected by users because they look like random association of terms. The usefulness of these phrases comes from the fact that either they are pieces of larger (semantically meaningful) phrases, or they are made up of local synonyms specific to the document collection used. The browsing experiments in cluster hierarchies and nearest-neighbor networks showed that the second organization consistently performs better than relevance feedback in different collections. Cluster browsing is more dependent on the characteristics of the collections; but when the circumstances are favorable, cluster browsing can produce larger improvements on retrieval than network browsing. Retrieval in both structures is much faster than relevance feedback since only a small portion of the database needs to be inspected.