RESEARCH IN DOCUMENT CLASSIFICATION AND FILE ORGANIZATION

Abstract : In an information storage and retrieval system, classification provides a means for organizing a mass of material into groups so that related items are brought together in a systematic fashion. By grouping documents into categories, the number of items to be scanned in response to a search request can be reduced and the efficiency of the system increased. To design a classification system, one must specify the number of classes to be established and the principle to be used in determining class membership. A number of mathematical procedures have been suggested for devising classification schedules. These include factor analysis, clump theory, and latent class analysis. It also has been suggested that these or similar techniques can be used to automatically classify documents into correct categories and speed up the processing of incoming material. The initial results appear promising. Further research is being undertaken to determine the retrieval effectiveness of automated indexing and classification systems. (Author)