Automatic Document Classification

Abstract : Starting with a collection of 405 document abstracts dealing with computers, the experiment in automatic document classification proceeds to construct an empirically based mathematically derived classification system by use of a factor analysis technique. The documents are then classified into these derived categories by five subjects, and the resulting classification serves as a criterion against which the automatic classification is to be evaluated. Of the ninety documents in the Validation Group which contained two or more clue words, and which therefore could be automatically classified, 44 documents, or 48.9%, were placed into their correct categories by use of a computer formula. These results are almost identical to the results obtained by Maron in a previous experiment using the same data but with a different set of classification categories and a different computational formula. The experimental evidence supports the conclusion that automatic document classification is possible. Additional experiments are described which when executed should improve the accuracy of the automatic classification technique. (Author)