Classification Clustering, Probabilistic Information Retrieval, and the Online Catalog

Research into online catalog use and users has found some pervasive problems with subject searching in these systems. Subject searches too often fail to retrieve anything, and those that do succeed often retrieve "too much" material. This article examines these problems and how they might be remedied. The theoretical principles for the design of effective information retrieval systems are discussed, and an experimental online catalog system based on these principles is described. The system, CHESHIRE, uses a method called "classification clustering," combined with probabilistic retrieval techniques, to provide natural language searching (which helps to reduce search failure) and to provide effective control of "information overload" in subject searching.