Information retrieval using letter tuples with neural network and nearest neighbor classifiers

Previous work has shown that statistics of letter tuples extracted from text samples can be effective in determining authorship. These statistics have been used to create displays that visually separate the works of different authors, and have been used as input to neural network classifiers which can accurately discriminate between authors. Similar applications are described by Bennett (1976), Clausing (1993), and Damashek (1995). The present paper extends this work by testing the effectiveness of letter tuples in information retrieval systems using neural network classifiers and nearest neighbor classifiers as the retrieval method. Testing was performed using 855 full-text Wall Street Journal articles and 50 narrative queries. Performance of neural and nearest neighbor methods was similar, with the product of recall and precision exceeding 0.1 on the given data.