Authorship attribution of text samples using neural networks and Bayesian classifiers

Previous work has shown that statistics of letter pairs extracted from text samples can be effective in discriminating between two authors writing in a similar style. This paper extends that work by using n-tuples for n from 1 to 5. The features used in classification are the relative frequencies of the tuples, transformed with a KL transform. Both three layer neural network classifiers and Bayesian classifiers are used with these features to classify text samples from two similar authors. The most effective combination was 2-tuples used with a neural network classifier, although other combinations did nearly as well.<<ETX>>